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++G +S G+IL+HGV+ E T FNGI I GA + +Q RVLMLS+KAR DANPIL 



Query: 349 LIDEKDVTAGHAASIGQVDPEDLYYLMSRGLNQKTAEQLVIRGPLGTVIAEIPVKEVRDE 408 

LIDE+DVTAGHAAS4G:t-+DP ++YLMSRG++4- AE+LVI GFL V+ ++P++ V++ 
Sbjct: 366 LIDEDDVTAGHAASVGKIDPIQMFYLMSRGISRAEAERLVIHGFIAPWGQLPIESVKER 425 

Query: 409 MIAVIDTKLE 418 

++ 1+ K++ 
Sbjct: 426 LVEAIERKVK 435 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5783> which encodes the amino acid 
sequence <SEQ ID 5784>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-t 

INTEGRAL Likelihood = -0.80 Transmembrane 387 - 403 ( 387 - 403) 

Final Results 

bacterial membrane Certainty=0 .1319 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB15259 GB:Z99120 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 177/428 (41%) , Positives = 267/428 (62%) , Gaps = 15/428 (3%) 

KEKLVAFSQAHAEPAWLQERRLAALEAIPNLELPTIERVKFHRWNLGDGT- -LTENESLA 6 0 
+E L +FS+ H EPAWL+ RL ALE +L +P ++ K WN + +NE L+ 

QEYLKSFSEKHQEPAWLKNLRLQALEQAEDLPMFKPDKTKITNWNFTNFAKHTVDNEPLS 70 





3 


Sbjct: 






61 


Sbjct: 


71 




112 


Sb j Ct : 


131 




171 


Sbjct: 


190 


Sbjct: 


231 




290 


Sbjct: 


309 




350 


Sbjct: 


369 


Query: 


410 


Sbjct: 


429 



DE KL A H A N A LYVP -t 



L++A S TY+E + S N + NI EVI + + + A+D L VTTY++R 



+D+ I+WAL +MN+G4 I++ ++L G G+ D K V 



+G+ + G+IL+HGV+ + + FNGIG I A A+A+QESRVLMLS++AR DANPILL 



IDE++VTAGHAAS+G+VDP +YYLMSRG+ +E AERLVI GFL V+ E+PI V++++ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 322/420 (76%) , Positives = 368/420 (86%) 



Query: 1 MSKEAILNFLQAKGEPTWLQELRLKAFEKIEELELPVIERVKFHRWNLGDGTILENDYTA 60 
M+KE ++ F QA EP WLQE RL A E I LELP IERVKFHRWNLGDGT+ EN+ A 
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Sbjct: 1 MTKEKLVAFSQAHAEPAWLQHt^LAALEAIPNLELPTIERVKFHRWNLGDGTLTENESLA 60 

Query: 61 nVPDFTELGNNPKLVQIGTQTVLEQVPMELIEKGWFTDFYSALEEIPEVIERYFGKARP 120 

+VPDF +G+NPKLVQ+GTQTVLEQ+PM LI+KGWF+DFY+ALEEIPEVIE +FG+A 
Sbjct: 61 SVPDFIAIGDNPKLVQVGTQTVLEQLPMALIDKGWFSDFYTALEEIPEVIEAHFGQALA 12 0 

Query: 121 FEEDRLAAYHTAYFNSGAVLYIPDNVEITQPIEGLFYQDSQSKVPFNKHILLIVGKNAKV 180 

F+ED+LAAYHTAYFNS AVLY+PD++EIT PIE +F QDS S VPFNKH+L+I GK +K 
Sbjct: 121 FDEDKLAAYHTAYFNSAA.VLYVPDHLEITTPIEAIFLQDSDSDVPFNKHVLVIAGKESKF 180 

Query: 181 SYLERFESIGDGTERTSANISVEVIAQAGSQIKFASIDRLGENVTTFISRRGRHSSDATI 240 

+YLERFESIG+ T++ SANISVEVIAQAGSQIKF++IDRLG +VTT+ISRRGR DA I 
Sbjct: 181 TYLERFESIGNATQKISANISVEVIAQAGSQIKFSAIDRLGPSVTTYISRRGRLEKDANI 240 

Query: 241 DWALGV^GNWADFDSDLIGDGSHANLI<WARSSGRQVQGIDTRVTNYGCNSVGHILQ 300 

DWAL VMNEGNV+ADFDSDLIG GS A+LIWVAASSGRQVQGIDTRVTNYG +VGHILQ 
Sbjct: 241 DWAI^VMNEGOTIADFDSDLIGQGSQADLKOTJASSGRQVQGIDTRVTNYGQRTVGHILQ 300 

Query: 301 HGVILERGTLTFNGIGHIIRGAKGADAQQESRVLMLSDKARSDANPILLIDENDVTAGHA 360 

HGVILERGTLTFNGIGHI+K AKGADAQQ3SRVLMLSD+AR+DANPILLIDEN+VTAGHA 
Sbjct: 301 HGVILERGTLTFNGIGHILtCDAKGADAQQESRVLMLSDQARADANPILLIDENEVTAGHA 360 

Query: 361 ASIGQVDPEDLYYLMSRGLNQKTAEQLVIRGFLGTVIAEIPVKEVRDEMIAVIDTKLEKR 420 

ASIGQVDPED+YYLMSRGL+Q+TAE+LVIRGFLG VIAEIP+ VR E+I V+D KL R 
Sbjct: 361 ASIGQVDPEDMYYLMSRGLDQETAERLVIRGFIiGAVIAEIPIPSVRQEIIKVLDEKLLNR 420 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1861 

A DNA sequence (GBSxl968) was identified in S.agalactiae <SEQ ID 5785> which encodes the amino 
acid sequence <SEQ ID 5786>. This protein is predicted to be ABC transporter, ATP-binding protein, 
Ycfl6 family. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

1 Final Results 

bacterial cytoplasm Certainty=0 . 2253 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15260 GB:Z99120 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 180/250 (72%) , Positives = 212/250 (84%) 

' Query: 2 SVLEIKNLHVSIEDKEILKGLNLTLKTGEIAAIMGPNGrGKSTLSAAIMGNPNYEVTAGE 61 
S L IK+LHV IE KEILKG+NL +K GE A+MGPNGTGKSTLSAAIMG+P YEVT G 
Sbjct: 4 STLTIKDLHVEIEGKEILKGVNLEIKGGEFHAVMGPNGTGKSTLSAAIMGHPKYEVTKGS 63 

Query: 62 ILFDGEDILELEVDERARLGLFIiAMQYPSEVPGITNAEFIRAAMNAGKADDDKISIRQFI 121 

I DG+D+LE+EVDERA+ GLFLAMQYPSE+ G+TNA+F+R+A+NA + + D+IS+ +FI 
Sbjct: 64 ITLDGKDVLEMEVDERACAGLFIAMQYPSEISGVTNADFLRSAINARREEGDEISLMKFI 123 

Query: 122 1 

K+DE ME I 

Sbjct: 124 f 

Query: 182 KOTSKGVNEMRGEGFGAMIITHTQRDLNYITPDKVHVMMDGKVVLSGGPEIAVR^ 241 

KWSKG+N+MR E FG ++ITHYQRLIMYITPD VHVMM G+W SGG EIA RLE EGY 
Sbjct: 184 KWSKGINKMRSENFGCLMITHYQRLLNYITPDVVHVMMQGRVVKSGGAELAQRLEAEGY 243 
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Query: 242 AQIAEELGLE 251 

I +ELG4E ■ 
Sbjct: 244 DWIKQELGIE 253 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5787> which encodes the amino acid 
sequence <SEQ ID 5788>. Analysis of this protein sequence reveals the following: 
Possible site: 48 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2417 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 225/255 (88%), Positives - 241/255 (94%) 

Query: 1 MS VLE I KNLHVS I EDKEI LKGLNLTLKTGE I AAIMGPNGTGKSTLSAAI MGNPNYEVTAG 60 

MS+LEI NLHVSIE KEILKG+NLTLKTGE+AA.IMGPNGTGKSTLSAAIMGNPNYEVT G 
Sbjct: 1 MSILEINNLHVSIEGKEILKGVNLTLKTGEVAA.IMGPNGTGKSTLSAMMGNPNYEVTQG 60 

Query: 61 E I LFDGEDI LELEVDEPARLGLFLAMQYPSE VPGITNAEF I RAAMNAGKADDDKI SI RQF 120 

+IL DG +IL+LEVDERARLGLFLAMQYPSE+PGITNAEF+RAAMNAGKAD+DKIS+R F 
Sbjct: 61 QILLDGvNILDLEVDERARLGLFLAMQYPSEIPGITNiffiFMRAAMNAGKADEDKISVRDF 120 

ITKLDEKM LLGMKEEMAERYLNEGFSGGEKKRNEILQLLMLEPKFALLDEIDSGLDIDA 
Sbjct: 121 ITKLDEKMALLGMKEEMAERYIiNEGFSGGEKKRNEILQLLMLEPKFALLDEIDSGLDIDA 180 

Query: 181 LKWS KGVNEMRGEGFGAMI ITHYQRLLNYITPDKVHWJMDGKVVLSGGPELRVRLEKEG 240 

LKWSKGVNEMRG+ FGAMIITHYQRLLNYITPD VHVMMDG++VLSG LA RLEKEG 
Sbjct: 181 LKOTSKGVNEMRGKDFGAMIITHYQRLIJ^ITPDLVHVI#roGRIvIiSGDAAI^ 240 

Query: 241 YAQ1AEELGLEYKEE 255 

YA IA++LG+EYKEE 
Sbjct: 241 YAGIAQDLGIEYKEE 255 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



40 Example 1862 

A DNA sequence (GBSxl969) was identified in S.agalactiae <SEQ ID 5789> which encodes the amino 
acid sequence <SEQ ID 5790>. This protein is predicted to be RgpG (rfe). Analysis of this protein sequence 
reveals the following: 
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Transmembrane 
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Transmembrane 
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INTEGRAL 


Likelihood = 
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Transmembrane 
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Likelihood = 
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Transmembrane 
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INTEGRAL 


Likelihood = 
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Transmembrane 
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108) 




INTEGRAL 


Likelihood = 


-4 


78 


Transmembrane 


184 


200 
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203) 
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INTEGRAL 


Likelihood = 


-3 


13 


Transmembrane 


119 
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119 


135) 




INTEGRAL 


Likelihood = 


-2 


97 


Transmembrane 


229 


245 


229 


250) 



Final Results 

bacterial membrane 



-- Certainty=0. 5840 (Affirmative) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suoo 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8919> which encodes amino acid sequence <SEQ ID 8920> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 5.18 
GvH: Signal Score (-7.5): -6.19 
Possible site: IS 





have an uncleavable N-term signal sec 










ALOM program 


count: 9 value: -12 


10 threshold: 
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INTEGRAL 


Likelihood =-12.10 


Transmembrane 
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255 
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INTEGRAL 


Likelihood = -9.82 


Transmembrane 


132 




124 - 
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INTEGRAL 


Likelihood = -8.60 


Transmembrane 


262 


278 


256 - 


285 


INTEGRAL 


Likelihood = -7.48 




184 


200 


182 - 


208 


INTEGRAL 


Likelihood = -5.31 


Transmembrane 




94 


75 - 


9S 


INTEGRAL 


Likelihood = -4.88 


Transmembrane 


18 


34 


17 - 


35 


INTEGRAL 


Likelihood = -4.78 


Transmembrane 


111 


127 




130 


INTEGRAL 


Likelihood = -3.13 


Transmembrane 


46 


62 


46 - 


62 


INTEGRAL 


Likelihood = -2.97 


Transmembrane 


155 


172 


156 - 


177 


PERIPHERAI 


Likelihood = 12.63 


284 










modified ALOM score: 2.92 













*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5840 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — : Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA82114 GB:AB022909 RgpG [Streptococcus rautans] 
Identities = 266/382 (69%), Positives = 317/382 (82%) 







TIEYI FVLIGAFLLS I ILTPI IRVI SLKVGAVDKPNARRINKVPMPSSGGLAI FLSFWT 


69 






T++++ VLI L S++LTP++R +L+VGAVD PNARRINKVPMPS+GGLAI +SFV+ 




Sbjct: 


7 


TLKFVLVLIATLLTSLVLTPLVRFFALRVGAVDNPNARRINKVPMPSAGGLAIIISFVIA 


66 




70 


TLFFMPMAASRHFIEVSYFHYILPVIIGGLWITTGFIDDIFELRPRYKMLGIIIAAIII 


129 






TL MPM SYF YILPV++G LV+ TGFIDD++EL P+ K LGI++ A+II 




Sbjct: 


67 


TLALMPMILKTQIGGKSYFEYILPWLGALVIALTGFIDDVYELSPKIKFLGILLGAVII 


126 


Query: 


13 0 


WKFTHFRFDSFKIPIGGPLLEFGPILTFFLTvLWIISITNAINLIDGLDGLVSGVSIISL 








W FT FRFDSFKIP GGP+L F P L+FFLT+LW+++ITNA+NLIDGLDGLVSGVS+ISL 




Sbjct: 


127 


WIFTDFRFDSFKIPFGGPMLHFNPFLSFFLTILWWAITNAVNLIDGLDGLVSGVSMISL 


186 


Query: 


190 


ATMAWSYFFLPKIDFFLTLTIVILIASIVGFFPYNYHPAIIYLGDAGALFIGFMIGVLS 


249 






TM +VSYFFL D FLTLTI +LI +1 GFFPYNYHPAI I YLGD GALFIGFMI VLS 




Sbjct: 


187 


TTMGLVSYFFLYDTDIFLTLTIFVLIFAIAGFFPYNYHPAIIYLGDTGALFIGFMISVLS 


246 




250 


LQGLKNSTAVAVITPVIILGVPILDTAVAIVRRKLSGKKISEADKMHLHHRLLSMGFTHR 


309 






LQGLKN+TAVAV+TP+ 1 +LGVP I +DT VAI+RR LSG+K EAD MHLHHRLLtMGFTHR 




Sbjct: 


247 


LCGLKNATAVAVWPIIVLGVPIVDTI^IIPJlTLSGQKFYFjmNMHLHHRLLAMGFTHR 


306 


Query: 


310 


GAVLWYGIAI I FSL IALLLNVS SRIGGI FLLLALLLAME I F IEGLNIWGENRTPLFNLL 


369 






GAVLWYGIA+ FSL++LLLNVSSR+GGI L+4 + A+EIFIEGL IWG RTPLF LL 




Sbjct: 


307 


GAVLWYGIAMFFSLVSLLLNVSSRLGGILLMIGVAFALEIFIEGLEIWGPKRTPLFRLL 


366 




370 


KFIGNSDYRQSVIAKYSDKHQK 391 








FIGNSDYRQ V+AKY K +K 




Sbjct: 


367 


AFIGNSDYRQEWAKYRRKKKK 388 





A related DNA sequence was identified in S.pyogenes <SEQ ID 5791> which encodes the amino acid 
sequence <SEQ ID 5792>. Analysis of this protein sequence reveals the following: 
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Possible site: 32 
eems to have an uncleavable N- 
Likelihood = 
Likelihood = 
Likelihood = -7. 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



233 - 



25 ( 1 - 

217 ( 198 - 

324 ( 305 - 

71 ( 51 - 

161 ( 138 - 

276 ( 251 - 

196 ( 172 - 

347 ( 330 - 

103 { 82 - 

129 { 112 - 

249 ( 232 - 



- Final Results 

bacterial membrane — - Certainty=0. 4312 (Affirmative) . 

bacterial outside --- Certainty=0. 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 



Query: 5 TIDYVLVLIGALLMSLFLTPLVRFLAFRVGAVDNPNARRVNKVPMPTSGGLAIFMSFLVA 64 

T+ +VLVLI LL SL LTPLVRF A RVGAVDNPNARR+NKVPMP++GGLAI +SF++A 
Sbjct: 7 TLKFVLVLIATLLTSLVLTPLWFFALRVGAVDNPNARRINKVPMPSAGGLAIIISFVIA 66 

Query: 65 SLGLIPIASKGAMFFGQTYFSYILPWIGATVITLTGFLDDLYELSPKLKMFGILIGAVI 124 

+L L+P+ K G++YF YILPW+GA VI LTGF+DD+YELSPK+K GIL+GAVI 

Sbjct: 67 TLALMPMILK-TQIGGKSYFEYILPWLGALVIALTGFIDDVYELSPKIKFLGILLGAVI 125 

Query: 125 VWAFTDFKFDSFKIPFGGPLLVFGPFLTLFLTVLWIVSITNAINLIDGLDGLVSGVSIIS 184 

+W FTDF+FDSFKIPFGGP+L F PFL+ FLT+LW+V+ITNA+NLIDGLDGLVSGVS+IS 
Sbjct: 126 IWI FTDFRFDSFKI PFGGPMLHFNPFLSFFLTILWWAITNAVNLIDGLDGLVSGVSMI S 185 

Query: 185 LVTMAIVSYFFLPQKDFFLTLTILVLISAIAGFFPYNYHPAMIYLGDTGALFIGFMIGVL 244 

L TM +VSYFFL D FLTLTI VLI AIAGFFPYNYHPA+IYLGDTGALFIGFMI VL 
Sbjct: 186 LTTMGLVSYFFLYDTDIFLTLTIFVLIFAIAGFFPYNYHPAIIYLGDTGALFIGFMISVL 245 

Query: 245 SLQGLKNSTAVAWTPVIILGVPIMDTIVAIIRRSLSGQKFYEPDKMHLHHRLLSMGFTH 304 

SLQGLKN+TAVAWTP+ 1 +LGVPI+DT VAIIRR+LSGQKFYE D MHLHHRLL+MGFTH 
Sbjct: 246 SLQGLKNATAVAWTPIIVLGVPIVDTTVAIIRRTLSGQKFYEADNMHLHHRLIjAMGFTH 305 

Query: 305 RGAVLVVYGITMLFSLISLLLNVSSRIGG\T^LMLGLLFGLEVFIEGLEIWGEKRTPLFNL 364 

RGAVLWYGI M FSL+SLLLNVSSR+GG+LLM+G+ F LE+FIEGLEIWG KRTPLF L 
Sbjct: 306 RGAVLWYGIAMFFSLVSLLLNVSSRLGGILLMIGVAFALEIFIEGLEIWGPKRTPLFRL 365 

• Query: 365 LKFIGNSDYRQAMLLKWKEKK 3 85 
L FIGNSDYRQ ++ K++ KK 
Sbjct: 365 LAFIGNSDYRQEWAKYRRKK 386 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 282/384 (73%), Positives = 334/384 (86%), Gaps = 1/384 (0%) 

Query: 6 MIPFTIEYIFVLIGAFLLS I ILTPI IRVISLKVGAVDKPNARRINKVPMPSSGGLAI FLS 65 

M FTI+Y+ VLIGA L+S+ LTP++R ++ +VGAVD PNARR+NKVPMP+SGGLAIF+S 
Sbjct: 1 MFSFTIDYVLVLIGALLMSLFLTPLVRFLAFRVGAvDNPNARRvNKVPMPTSGGLAIFMS 60 

Query: 66 FWTTLFFMPMAAS-RHFIEVSYFHYILPVIIGGLWTTTGFIDDIFELRPRYKMLGIII 124 

F+V +L +P+A+ F +YF YILPV+IG V+T TGF+DD++EL P+ KM GI+I 
Sbjct: 61 FLVASLGLIPIASKGAMFFGQTYFSYILPWIGATVITLTGFLDDLYELSPKLKMFGILI 120 

Query: 125 AAIIIWKFTHFRFDSFKIPIGGPLLEFGPILTFFLTVLWIISITNAINLIDGLDGLVSGV 184 

A+I+W FT F+FDSFKIP GGPLL FGP LT FLTVLWI+SITNAINLIDGLDGLVSGV 
Sbjct: 121 GAVTVWAFTDFKFDSFKI PFGGPLLVFGPFLTLFLTVLWIVS ITNAINLIDGLDGL VSGV 180 
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Query: 185 SIISLATMAWSYFFLPKIDFFLTLTIVILIASIVGFFPYNYHPAIIYLGDAGALFIGFM 244 

SIISL TMA+VSYFFLP+ DFFLTLTT++LI++I GFFPYNYHPA+IYLGD GALFIGFM 
Sbjct: 181 SIISLVTMA1VSYFFLPQKDFFLTLTILVLISAIAGFFPYNYHPAMIYLGDTGALFIGFM 240 

Query: 245 IGVLSLQGLKMSTAVAVITPVIILGVPILDTAVAIVIUUCLSGKKISFADKMHLHHRLLSM 304 

IGVLSLQGLKNSTAVAV+TPVI ILGVPI+DT VAI+RR LSG+K E DKMHLHHRLLSM 
Sbjct: 241 IGVLSLQGLKNSTAVAWTPVIILGVPIMpTIVAIIRRSLSGQKFYEPDKMHLHHRLLSM 300 



Query: 305 GFTHRGAVlWYGIAIIFSLIALIjLNVSSRIGGIFLLLALLIiAMEn 

GFTHRGAVLWYGI ++FSLI+LLLNVSSRIGG+ L+L LL +E+FIEGL IWGE RTP 
Sbjct: 301 GFTHRGAVLWYGITMLFSLISLLLNVSSRIGGVLLMLGLLFGLEVFIEGLEIWGEKRTP 360 

Query: 365 LFNLLKF IGNSDYRQS VI AKYSDK 388 

LFNLLKFIGNSDYRQ+++ K+ +K 
Sbjct: 361 LFNLLKFIGNSDYRQAMLLKWKEK 384 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1863 

20 A DNA sequence (GBSxl970) was identified in S.agalactiae <SEQ ID 5793> which encodes the amino 
acid sequence <SEQ ID 5794>. This protein is predicted to be negative regulator of genetic competence. 
Analysis of this protein sequence reveals the following: 
Possible site: 16 

>» Seems to have no N-terminal signal sequence 

25 , 

Final Results 

bacterial cytoplasm — Certainty=0. 3460 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

30 

A related GBS nucleic acid sequence <SEQ ID 9483> which encodes amino acid sequence <SEQ ID 9484> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA82113 GB:AB022909 negative regulator of genetic competence 
[Streptococcus mutans] 
Identities = 168/248 (67%) , Positives = 205/248 (81%) , Gaps = 9/248 (3%) 

MEMKQISETTLKITISMEDLEDRGMELKDFLIPQEKTEEFFYSVMDELDLPENFKNSGML 60 
MEMKQISETTLKITISMEDLE+RGMSLKDFLIPQEKTEEFFY+VMDELDLPENFK SGML 
MEMKQISETTLKITISMEDLEERGMSLKDFLIPQEKTEEFFYTVMDELDLPENFKGSGML 6 0 



SFRVTP+ DRIDVFVTKSE++K+LNLE+L+D DISKMSPEDFF TLE++M EKGD A 





' 1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


172 


Query: 


241 


Sbjct: 


232 



h TQ+ E+ ++E+ 4 



EASEL+K YHMT+LL+LE++P Y+A+LM+ARMLEHA GTKTRAYL EH +QLI D 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 5795> which encodes the amino acid 
sequence <SEQ ID 5796>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal 



Final Results 

bacterial cytoplasm Certainty=0 .3307 (Affirmative) < succ 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 {Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 171/253 (67%), Positives = 209/253 (82%), Gaps = 2/253 (0%) 





1 


MEMKQISETTLKITISMEDLEDRGMELKDFLIPQEKTEEFFYSVMDELDLPENFKNSGML 








MEMKQISETTLKITISM+DLE+RGM3LKDFLIPQEKTEEFFYSVMDELDLP+NFK+SGML 




Sbjct: 


3 


MEMKQISETTLKITISMDDLEERGMELKDFLIPQEKTEEFFYSVMDELDLPDNFKDSGML 


62 


Query: 


61 


SFRVTPKKDRIDVFVTKSELSKDLNLEELADLGDISKIISPEDFFKTLEQSMLEKGDTD/AH 


120 






SFRVTP+KDR+DVFVTKSE++KD+NLE+LA+ GD+ S +M+ PEDFFK+LEQSM EKGD AH 


122 


Sbjct: 


63 
121 


AKLAEIENMMDKATQEW- - EEWSEEQPEKEVETIGYVHYWDFDNIEAWRFSQTIDF 


178 






KL +IE +M+ + + + +4- E E 4- YVHYV DF I V F4-4-TIDF 




Sbjct: 


123 


EKLEKIEEIMEDWEATLANQSEAADPSTNHESEPLDYVHYVLDFSTITEAVAFAKTIDF 


132 




179 


PIEASELYKNGKGYHMTILLDLENQPSYFANLMYARMLEHANVGTKTRAYLKEHSIQLIH 


238 






IEASELYK YHMTILLD++ QPSYFAN+MYAR+4-EHAN G4-KTRAYL+EH 4-QL4- 




Sbjct: 


183 


SIFASELYKGSNCYHMTILLDVQQQPSYFANVMYARLIEHANPGSKTRAYLQEHGLQLML 


242 




239 


DDAISKLQMIEMG 251 








D A+ 4-LQ IE+G 




Sbjct: 


243 


DGAVEQLQKIELG 255 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1864 

A DNA sequence (GBSxl971) was identified in S.agalactiae <SEQ ID 5797> which encodes the amino 
acid sequence <SEQ ID 5798>. This protein is predicted to be BacA (bacA). Analysis of this protein 
sequence reveals the following: 

Possible site: 17 



> Seems to 


have no N-terminal signal sequence 










INTEGRAL 


Likelihood = -9.02 Transmembrane 


115 


131 


111 


135 


INTEGRAL 


Likelihood = -8.97 Transmembrane 




243 


219 


247 


INTEGRAL 


Likelihood = -7.86 Transmembrane 


48 


64 


44 


69 


INTEGRAL 


Likelihood = -7.27 Transmembrane 


263 


279 


260 


279 


INTEGRAL 


Likelihood = -7.22 Transmembrane 


87 




85 


107 




Likelihood = -3.50 Transmembrane 


2 


18 


1 


19 



Final Results 

bacterial membrane Certainty=0. 4609 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD504S2 GB:AF169967 BacA [Flavobacterium johnsoniae] 
Identities = 101/275 (36%), Positives = 165/275 (59%), Gaps = 22/275 (8%) 

Query: 7 LKALFLGVVEGVTEWLPVSSTGHLILVQEFMKLNQSKSFVEMFNIVIQLGAIMAVIVIYF 66 
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Query: 67 KKLNPFQPGKSAEEIRLTWQLWLKWIACIPSILIALPFDNWFEAHFNFMIPIAIALIFY 126 

KR FQ T + K+++A IP++++ L ++ + + +A++L+ 

Sbjct: 63 KRF--FQ- TLDFYFKLLVAFIPAVVLGLLLSDFIDGLLENPVTVAVSLLIG 110 

Query: 127 GFVFI WVEKRNAHLKPQVTELASMSYKTAFLIGCFQVLSIVPGTSRSGATILGAII 182 

G + + W WA Q ++Y A IG FQ ++++PG SRSGA+I+G + 

Sbjct: 111 GLILLKVDEWFNNPNAAETSQ KITYLQALKIGLFQC I AMI PGVSRSGAS I VGGMS 165 

Query: 183 IGTSRSVAADFTFFLAIPTMFGYSGLKAVKYFLDGNVLSLDQSLILLVASLTAFWSLYV 242 

SR+ AA+F+FFLA+PTM G + K Y+ G LS DQ IL++ ++ AF+V+L 
Sbjct: 166 QKLSRTTAAEFSFFLAVPTMLGATVKKCYDYYKAGFELSHDQVNILIIGKWAFIVALLA 225 

Query: 243 IRFLTDYVKRHDFTIFGKYRIVLGSLLILYWLVVH 277 

I + ++ ++ F +FG YRI+ G +L+L +H 
Sbjct: 226 I KTFI SFLTKNGFKVFGYYRI IAGI I LLL I HFF I H 260 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5799> which encodes the amino acid 
sequence <SEQ ID 5800>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =-11. 
Likelihood = -9. 
Likelihood - -7 
Likelihood = -7 
Likelihood = -5 
Likelihood = -3. 



Transmembrane 225 - 241 ( 219 - 247) 

Transmembrane 115 - 131 ( 109 - 135) 

Transmembrane 48 - 64 ( 44 - 69) 

Transmembrane 87 - 103 ( 85 - 108) 

Transmembrane 263 - 279 ( 2S2 - 279) 

Transmembrane 2 - 18 ( 1 - 19) 



- Final Results 

bacterial r 
bacterial outside - 
bacterial cytoplasm - 



•- Certainty=0. 5522 (Affirmative) . 
•- Certainty=0. 0000 (Not Clear) < t 
■- Certainty=0. 0000 (Not Clear) < 



The protein has homology with the following sequences in the databases: 



LKRIFFGIIEGITEWLPVSSTGHLILVQEFIRLNQDKAFIEMFNIVIQLGAIIAVMLIYF 66 
L+AI +IEGITE+LPVSSTGH+I+ F + + F ++F IVIQLGAI++V+++YF 
LQAIVLAVIEGITEFLPVSSTGHMIIASSFFGIAHED-FTKLFTIVIQLGAILSVWLYF 62 

: 67 ERLNPFQPGKTAREVQLTWQLWLKWIACIPSILIAVPLDNWFEAHFYFMVPIAIALIVY 126 
+R FQ T + K+++A IP++++ + L ++ + V +A++L+ + 

KRF--FQ TLDFYFKLLVAFIPAWLGLLLSDFIDGLLENPVTVAVSLLIG 110 

7 GIAFIWIEKRNAQQEPAVTELARMSYKTAFFIGCFQVLSIVPGTSRSGATILGAIILGTS 186 
G+ + +++ A T +++Y A IG FQ ++++PG SRSGA+I+G + S 

: 111 GLILLKVDEWFHNPNAAETS-QKITYLQALKIGLFQCIAMI PGVSRSGAS IVGGMSQKLS 169 

1 RTVAADFTFFLAIPTMFGYSGLKAVKFFLDGHHLDFAQVLILLVASLTAFWSLLAIRFL 246 

RT AA+F+FFLA+PTM G + K ++ G L QV IL++ ++ AF+V+LLAI+ 
) RTTAAEFSFFLAVPTMLGATVKKCYDYYKAGFELSHDQVNILIIGNWAFIVALLAIKTF 229 

: 247 TDYVKKHDFTIFGKYRIVLGSLLLIYSFF 275 

++ K+ F +FG YRI+ G +LL+ FF 
: 230 ISFLTKNGFKVFGYYRI IAGI ILLLIHFF 258 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/272 (83%) , Positives = 253/272 (92%) 



Query: 1 MLIIELLKALFLGVVEGVTEVJLPVSSTGHLILVQEFMKLNQSKSFVEMFNIVIQLGAIMA 60 
MLIIELLKA+F G++EG+TEWLPVSSTGHLILVQEF++LNQ K+ F+EMFNI VIQLGAI +A 
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Sbjct: 1 MLIIELLKAIFFGIIEGITEWLPVSSTGHLILVQEFIRLNQDKAFIEMFNIVIQLGAIIA 60 

Query: 61 VIVIYPKRLNPFQPGKSAREIRLTWQLWLKVVIACIPSILIALPFDNWFEAHFNFMIPIA 120 

V++IYF+RLNPFQPGK+ARE++LTWQLWLKWIACIPSILIA+P DNWFEAHF FM+PIA 
Sbjct: 61 WLIYFERmPFQPGKTAREVQLTWQLWLKVVIACIPSILIAVPLDNWFEAHFYFMVPIA 120 

Query: 121 IALIFYGFVFIWVEKRNAHLKPQVTELASMSYKTAFLIGCFQVLSIVPGTSRSGATILGA 180 

IALI YG FIW+EKRNA +P VTELA MSYKTAF IGCFQVLSIVPGTSRSGATILGA 
Sbjct: 121 IALIWGIAFIWIEKRNAQQEPAVTEIARMSYKTAFFIGCFQVLSIVPGTSRSGATILGA 180 

Query: 181 IIIGTSRSVAADFTFFtAIPTMFGYSGLKAVKYFLDGNVLSLDQSLILLVASLTAFWSL 240 

II+GTSR+VAADFTFFLAIPTMFGYSGLKAVK+FLDG+ L Q LILLVASLTAFWSL 
Sbjct: 181 1 1 LGTSRTVAADFTFFLAI PTMFGYSGLXAVKFFLDGHHLDFAQVL I LLVASLTAFWSL 240 

Query: 241 YVIRFLTDYVKRHDFTIFGKYRIVLGSLLILY 272 

IRFLTDYVK+HDFTIFGKYRIVLGSLL++Y 
Sbjct: 241 LAIRFLTDYVKKHDFTIFGKYRIVLGSLLLIY 272 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1865 

A DNA sequence (GBSxl972) was identified in S.agalactiae <SEQ ID 5801> which encodes the amino 
acid sequence <SEQ ID 5802>. Analysis of this protein sequence reveals the following: 
Possible site: 42 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.65 Transmembrane 494 - 510 ( 488 - 519) 
INTEGRAL Likelihood = -8.01 Transmembrane 263 - 279 ( 256 - 288) 
INTEGRAL Likelihood = -5.95 Transmembrane 25 - 41 ( 20 - 43) 
INTEGRAL Likelihood = -4.94 Transmembrane 475 - 491 ( 473 - 493) 

. Final Results 

bacterial membrane Certainty=0 . 4461 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9481> which encodes amino acid sequence <SEQ ID 9482> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB99606 GB:U67598 M. jannaschii predicted coding region MJ1577 
[Methanococcus jannaschii] 
Identities = 41/172 (23%), Positives = 78/172 (44%), Gaps = 19/172 (11%) 

Query: 479 LISFWIIYTLFIOTFTYFCIYLLLFGVIIjLIKKIIFMI^TRKISNGYIVTEDGASRVYQW 538 

+IS +4 ++ F+ ++ + ++ ++ II +T G ++ +W 

Sbjct: 442 VISILLAVFLYFIPKYSQTFNEVFYLSIVFWQNIILALTPTSLFGRWKANYYKEKL-EW 500 

Query: 539 TSFRNMLRDIKSFDRSELESIVLWNRILVYATLFGYADRVEKALR-VNQIDIPERFANID 597 

+F+N L ++ + E I +W L+Y T G D+V +A++ +N ++ +1 
Sbjct: 501 DAFKNFLSNIAMIKKYSPEDISIWIOVJLIYGTALGVGDKVVEAMKSLNLSELVADYVIIH 560 

Query: 598 SHQFAISVNQSSNHFSTITEDVSHA3NFSVKSGGSSGGFSGGGG--GGGGGA 647 

S+ ++ + S + ST GS GGF GGG GGGGGA 

Sbjct: 561 SNYDSMKTSVDSVYSSTT- GSGGGFGAGGGFGGGGGGA 597 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5803> which encodes the amino acid 
sequence <SEQ ID 5804>. Analysis of this protein sequence reveals the following: 
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Likelihood = -7.91 Transmembrane 486 - 502 ( 483 - 508) 

integral Likelihood = -5.89 Transmembrane 465 - 481 ( 460 - 483) 

INTEGRAL Likelihood = -2.18 Transmembrane 244 - 260 ( 241 - 260) 

5 Final Results 

bacterial membrane Certainty=0. 4163 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the databases: 

>GP:AAB99606 GB:U67598 M. jannaschii predicted coding region MJ1577 
[Methanococcus jannaschii] 
Identities = 59/263 (22%), Positives = 106/263 (39%), Gaps = 14/263 (5%) 

15 Query: 369 FLDMAFGNJOTTLPTOQLFSQYHyDADTIKQLK:<TYKGKKLEQEVRQSSEQVIKAMKKASA 428 

++ + G K+ + L + Y++D +K L K K + E +S Q K+ K 
Sbjct: 346 Y1KIMNGGKIEILKTDLENLDVYESDVMKFLMKYSKNNVFDPEYIKSLAQKYKSSKDKLK 405 

Query: 429 AI TNNVLET I KKLNLPDT YRQMTPA- - EKRKSNSVQGLGCLLLI LNSGLL I YLAI KESGL 486 
20 ++EK+P ++AER + L+ ++L L ++ 

Sbjct: 406 KLKD ELDKIMEYPRYSSKWNAFLETRGKICI I IALLVI S I LLAVFLYFI PKYSQTFN 462 

Query: 487 ALIYLALMVLTMCLGFYISLKLDQYKKLGIETPEGGVRLHQWQSFKNMIRDIDKFEDVAI 546 
+ YL+++ + ILL G +W +FKN + ++ + + 

25 Sbjct: 463 EVFYLS I VFWQ NIILALTPTSLFGRWKANYYKEKLEWDAFKNFLSNLAMIKKYSP 518 

Query: 547 EGLVVWNRVLOTATLFGYAKOTEKYLKOTRIALPEvYQAVRPGELSMVMYATTPTFVSSL 606 

E + +W L+Y T G KV +K ++ + V + Y + T V S+ 

Sbjct: 519 EDISIWKDWLIYGTALGVGDKWEAMKSLNLS ELVADYVI IHSNYDSMKTSVDSV 573 

30 

Query: 607 SSATTSSNFSVSSGGGISGGGGG 629 

S+TT S +GGG GGGGG 

Sbjct: 574 YSSTTGSGGGFGAGGGFGGGGGG 596 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 241/635 (37%) , Positives = 372/635 (57%) , Gaps = 18/635 (2%) 

Query: 22 MKKCFLAI CLALSFFMVSVQADE VDYNI PHYEGNLTIHNDNSADFTEKVTYQFDSSYNGQ 81 
MKK + + L S + ++A +VDY+I +YEG L + +N+A F +KVTYQFD+SYNGQ 
40 Sbjct: 1 MKKILMTLVLCFSLLGIRIKAADvnYSITNYEGQLLLSKENTARFEQKVTYQFDTSYNGQ 60 

Query: 82 YVTLGTAGKLPDNFDINNKPQvEVSINGKVRKVSYQIEDLEDGYRLKVFNGGEAGDTVKV 141 

Y++LG G LP F 1+ KP+VEV NG+ VS + DL DGYRLK++N G+AGD V V 
Sbjct: 61 YISLGRTGHLPAGFAIDQKPKVEVYQNGQQVPVSQEFSDLGDGYRLKLYNAGQAGDKVDV 120 

45 

Query: 142 NVQWKLKNVLFMHKDVGELNWI P I SDWDKTLEKVDFWI STDKKVALSRLWGHLGYL - KTP 200 

V W+L ++L ++DV ELNW PISDWDKTLEKV ++T + S LW H GY K P 
Sbjct: 121 KVIWQLHHLLTAYQDVAELNWTPISDWDKTLEKVSLTVTTPTDIQDSNLWAHRGYYQKKP 180 



Query: 259 ILSFLLRILLPSFFIIA7TLFISIRVFLFRKKVNKYGQFPKEHHLYEAPEDLSPLELTQSI 318 

+L h ++P + L+ 1+ +K+ N+Y H YE PEDLSPL LTQ+I 

Sbjct: 241 LLQLLFGKVIPLVEVGFLLWQLIQFTRLKKQFKRYHLANHTDHSYEVPEDLSPLVLTQAI 300 

Query: 319 YSMSFKNFQ DEEKKTHL ISQEQLIQSILLDLIDRKVL NYDDNLLSLANLD 368 

Y SF E +K + ++ E L+Q+ LLDLID+KVIj h ++ LD 

Sbjct: 301 YGQSFAYLSPTASESQKLLIPKGVTFEALVQATLLDLIDQKVLLLTKEEGKAYLEISQLD 360 

Query: 369 RASDAEIDFIEFAFADSTSLKPDQLFSNYQFSYKETLRELKKQHKASDLQTQMRRRGSNA 428 

R +D E F++ AF + +L DQLFS Y + +T+++LKK +K L+ ++R+ 
Sbjct: 361 RVTDEEAAFLDMAFGNKVTLPVDQLFSQYHYD-ADTIKQLKKTYKGKKLEQEVRQSSEQV 419 

Query: 429 LSRITRLTRLISKDNINSLRRKGISSPYRKKSSEESKELSRLKRFSYLSPLISFWIIYT 488 
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YR+M+ E ++ + ++ 





420 


Query: 


489 


Sbj Ct : 


479 


Sb j Ct : 


549 




609 


Sbjct: 


599 



I T +G R++QW SF+NM+RDI 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8921> and protein <SEQ ID 8922> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Diacrim Score: 10.29 
GvH: Signal Score (-7.5): 3.11 

Possible site: 23 
>>> Seems to have a cleavable N-terra signal seq. 
ALOM program count: 3 value: -8.65 threshold: 0.0 

INTEGRAL Likelihood = -8.65 Transmembrane 475 - 491 ( 469 - 500) 
INTEGRAL Likelihood = -8.01 Transmembrane 244 - 260 ( 237 - 269) 
Likelihood = -4.94 Transmembrane 456 - 472 ( 454 - 474) 



peripheral Likelihood = 2 
modified ALOM score: 2.23 



Final Results 

35 bacterial membrane Certainty=0 . 4451 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no homology with any sequences in the databases. 

40 Example 1866 

A DNA sequence (GBSxl973) was identified in S.agalactiae <SEQ ID 5805> which encodes the amino 
acid sequence <SEQ ID 5806>. This protein is predicted to be glutamine-binding periplasmic 
, protein/glutamine transport system perme. Analysis of this protein sequence reveals the following: 

Possible site: 24 
45 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.86 Transmembrane 301 - 317 ( 295 - 324) 
INTEGRAL Likelihood = -6.05 Transmembrane 479 - 495 ( 473 - 496) 
INTEGRAL Likelihood = -0.59 Transmembrane 359 - 385 [ 369 - 385) 

50 Final Results 

bacterial membrane Certainty=0 .4545 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA17584 GB:D90907 glutamine-binding periplasmic protein 
[Synechocystis sp.] 

Identities = 147/534 (27%), Positives = 256/534 (47%), Gaps = 75/534 (14%) 
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Query, 


4 


Sbjct: 


24 


Query.: 


64 


Sbjct: 


69 


Query: 


124 


Sbjct: 


129 


Query: 


184 


Sbjct: 


183 


Query: 


241 


Sbjct: 


240 


Query: 


275 


Sbjct: 


300 


Query: 


335 


Sbjct: 


343 




390 


Sbjct: 


403 




450 


Sbjct: 


4S3 



ILLSLFTiUjLITFGGMTSIQADEYLRVGKE^YAPFNi'JTQNDJSrmGAVPIEGTDQYANGY 63 
+LL++ LL F ++ + + V E + PF T E T Q G+ 

EATGQLT-GF 58 



I- T ER + ++FS PY+ 



■-VTNFDSITSA 182 



+G DA +++RP 



3LVGKVGTAQSLTERSQANPN 299 



IY+E FRGTPM+VQ +IY+G 



Y++EI+RGGI S+D+GQ+EA +LG + QTM++++ PQ R ILP GNEF+ IKDTS 



VI EL+ G + TY+ F+ + +A++Y +LT + + +++E D 
7AVIGFQELFREGQLIVATTYRAFEVYIAVALVYLLLTTISSFVFKWLENYMD 516 

There is also homology to SEQ ID 1 194. 

A related GBS gene <SEQ ID 8923> and protein <SEQ ID 8924> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 2 
McG: Discrim Score: 6.23 
GvH: Signal Score (-7.5): 0.11 

Possible site: 24 
>>i Seems to have a cleavable N-term signal seq. 
ALOM program count : 3 value : - 8 . 

INTEGRAL Likelihood = -8.86 

INTEGRAL Likelihood = -6.05 

PERIPHERAL Likelihood = 1.32 
modified ALOM score: 2.27 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 4545 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

34.3/57.3% over 462aa 

Synechocystis PCC6803 

EGAD 1 48193 1 glutamine -binding periplasmic protein/glutamine transport system permease 
protein Insert characterized 
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GP | 1652664 | dbj | BftR17584.il |D90907 glutamine-binding periplasmic protein {Synechocystis 
sp.} Insert characterized 

PIR|S77250|S77250 hypothetical protein - Syneohocystis sp. (strain PCC 6803) Insert 
characterized 

5 

ORF01242(454 - 1809 of 2148) 

EGAD|48193|slll270(54 - 516 of 530) glutamine-binding periplasmic protein/glutamine 
transport system permease protein {synechocystis PCC6803}GP | 1652664 |dbj | BAA17584 . 1 | |d90907 
glutamine-binding periplasmic protein {Syneohocystis sp. }PIR| S77250 | S77250 hypothetical 
10 protein - Synechocystis sp. (strain PCC 6803) 

%Match =12.3 

%Identity =34.2 %Similarity =57.2 

Matches = 128 Mismatches = 149 Conservative Sub.s = 86 

15 204 234 264 294 324 354 384 414 

PSWCIPF*HKOTimFQ*DiroiEIDLVFR*NRRK*LIGGC*MKKILLSLFTALLiraGGMTSIQADEYLRVGMEAAYAP 

MKGMVKLGHWGKTWRYYLLLaLGVLIAIAIPLLPAFSQVS 
10 20 30 40 

20 

444 474 495 525 555 585 615 645 

FNWTQNDNTNGAVPIEGTDQ- - - YANGYDVQVAKKIAKKIjMKKWWKTKWEGLVPALTSGKLDMI IAGMSPTEERKKEI 
I I 11= |>|| = = = = I = ==|:=lll I = h " I II = « 

RQTIIVATEPTFPPFEMTDEATGQLTGFDVDLIQAIGEAAQVTVDIQGYPFDGIIPALQSNTVGAAISAITITPERAQSV 
25 50 60 70 80 90 100 110 120 



NFSKPYYISEPTLWNAEGKXTNAKNISDFKNAKVTAQQG 

,H ||= | : | : | | ||: I = = I = : I I : l :: I =1 II = = = 

SFSSPYFKSVLAIAVQ-DGNDT- IKNLKDLEGKRLAVAIGTTGAMVATNVPGAKVTNFDSITSALQELV-NGNADAVIND 



903 957 987 

RP DATSAQTANPIOjK-MIELHQG-FKTSDADTNISV 

II I III = III 
RPVLLYAIKDAGLRNVKISADV NPPFLPLVAPSLVGKVGTAQSLTERSQANPNDNFLITLFRNLFKGS- - 



1017 1047 1077 1107 1137 1167 1197 1227 

G^KGDIffilNQWQVIiESISRDKQIALMDKMIKEQPSVKKEKNC-KPNFFEQMATILKNNGSQFLRGTATTLLISMVGTIV 

:: ::: | 

. TT.TVT.T.TAF 



45 1257 1284 1314 1344 1374 1404 1419 1449 

GLFIGLLIGV-FRTAPKSDNKLKAALQKDLGWLLNIYIEVFRGTPMIVQSMVIYYGTAQAF GVSLDRTLAAI FI V 

=1 11= I I II II == 11=1 111111=11 =11=1 I l===ll III = 

SVFFGLIGGTGVAIALISD IKPLQLIFRIYVEFFRGTPMLVQLFIIYFGLPALFKEIGLGITIDRFPAAIIAL 

340 350 360 370 380 390 

50 

1479 1509 1539 1569 1599 1629 1659 1689 

SINTGAYMSEIWGGIFSVDKGQFEAATALGFTHGQTMRKIVLPQVVROT 
, 1=1 ll==ll=llll 1=1=11=11 =11 = lll==:=:|l I III 1111= 11111= II 11= I 
SLOTAAYLAEIIRGGIQSIDQGQWEACESLGMSPWQTMKEVIFPQAFRRILPPLGNEFITLIKDTSLTAVIGFQELFREG 
55 410 420 430 440 450 460 470 

1719 1749 1779 1809 1839 1869 1899 1929 

OTVATQTYQYFQTFTIIAIIYFILTFTVTRILRYIEKRFDSDNYTTGANQLQV*EVGMTQAILEIKHLKKSYGSNEVL 
= ||= 1= = =l==l==ll = =====1 I 

60 QLIVATTYRAFEWIAVALWLLLTTISSWFKWLENYMDPIGRAKKKAKAATA 
490 500 510 520 530 



There is also homology to SEQ ID 5804. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
65 vaccines or diagnostics. 
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Example 1867 

A DNA sequence (GBSxl974) was identified in S.agalactiae <SEQ ID 5807> which encodes the amino 
acid sequence <SEQ ID 5808>. This protein is predicted to be ATP-binding. Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3208 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB73160 GB:AL139076 putative glutamine transport ATP-binding 
protein [Campylobacter jejuni] 
Identities = 132/241 (54%), Positives = 178/241 (73%), Gaps = 1/241 (0%) 



Query: 


5 


Sbjct: 


1 


Query: 


65 


Sbjct: 


61 






Sbjct: 


121 




185 


Sbjct: 


180 



N+L+K D+N R+K+ MVFQ J 



V++R+ FMDKG IA +PK++FENP+ ER +EFL + L 
NVANRI FFMDKGKIAVDAS PKEVFENPSNERLREFLNKVL 240 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2157> which encodes the amino acid 
sequence <SEQ ID 2158>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1170 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 212/246 (86%) , Positives = 237/246 (96%) 

Query: 1 MTQAILEIKHLKKSYGSNEVLKDISLSVNKGEVISIIGSSGSGKSTFLRSINLLEEPSGG 60 

M+ +I+EIK+LKKSYGSNEVLKDISLSVNKGEVISIIGSSGSGKST LRSINLLEEPS G 
Sbjct: 24 MSNSIIEIKNLKKSYGSNEVLKDISLSVNKGEVISIIGSSGSGKSTLLRSINLLEEPSAG 83 

Query: 61 EILYHGHNVLEKGYDIM«REKLG^FQSFNLFENI^ILENAIVAQTTVLKRERQEAEKI 120 

+IL+HG +VL 4- Y+L +YREKLGMVFQSFNLFENIiW4-LENAIVAQTTVLKR+R +AE+I 
Sbjct: 84 QILFHGEDVLAEHYmTHYREKLGMVFQSFNLFENLNV^^ 143 

Query: 121 AKENLNAVGMTEQYWKAKPKQLSGGQKQRVAIARALSVNPEAILFDEPTSALDPEIWGEV 180 

AKETO^NAVGMTEQYW+AKPKQLSGGQKQRVAIAPJJjSVNPEA+IiFDEPTSALDPEMVGEV 
Sbjct: 144 AKENLNAVGMTEQYWQAKPKQLSGGQKQRvAIARALSVNPEA^IIlFDEPTSALDPE^lVGEV 203 

Query: 181 LKTMQDI^SGLTMIIWHE^FAKWSDRVIFMDKBIIAEQGTPKQLFENPTQERTKEF 240 

LKTMQDLAKSGLTMIIVTHEMEFA++VSDR+IFMDKG+I E+G+P+Q+FENPTQ+RTKEF 
Sbjct: 204 LKTMQDIiAKSGLTMIIVTHEMEFARDVSDRIIFMDRGLITEEGSPQQIFENPTQDRTKEF 263 
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Query: 241 LQRFLK 246 

LQRFLK 
Sbjct: 264 LQRFLK 269 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1868 

A DNA sequence (GBSxl976) was identified in S.agalactiae <SEQ ID 5809> which encodes the amino 
10 acid sequence <SEQ ID 5810>. This protein is predicted to be hypersensitive-induced response protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-17.94 Transmembrane 4 - 20 ( 1-28) 

15 

Final Results 

bacterial membrane Certainty=0. 8175 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

A related GBS nucleic acid sequence <SEQ ID 9479> which encodes amino acid sequence <SEQ ID 9480> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GE:AAF68390 GB-.AF236374 hypersensitive-induced response protein 
25 [Zea mays] 

Identities = 127/275 (46%) , Positives = 174/275 (63%) , Gaps - 1/275 (0%) 



Query: 19 ITSLYWKQQTVAIIERFGKyQKTATSGIHIRVPLGinKIAARVQLRLLQSEIIVETKTK 78 
I L V Q TVAI E FGK4 + G H +IA + LR+ Q ++ ETKTK 

30 Sbjct: 4 ILGLVQVDQSTVAIKENFGKFSEVLEPGCHFLPWCIGQQIAGYLSLRVRQLDVRCETKTK 63 



Query: 79 DNVFVTLNIATQYRWENNVTDAYYKLIKPEAQIKSYIEDALRSSVPKLTLDELFEKKDE 138 

DNVFVT+ + QYR + +DA+YKL QI+SY+ D +R++VPKL LD+ FE+K+E 

Sbjct: 64 DNVFVTWASVQYRAIADKASDAFYKLSNTREQIQSYVFDVIRATVPKLGLDDAFEQKNE 123 

Query: 139 IALEVQHQVAEEMSTYGYIIVKTLITKVEPDAEVKQSHNEINAAQRKRVAAQELANADKI 198 

IA V+ ++ + MSTYGY IV+TLI +EPD VK-H-MNEINAA R RVAA E A A+KI 
Sbjct: 124 IAKAVEEELEKAMSTYGYQIVQTLIVDIEPDDRVKRANINEINAAARMRVAASEECAEAEKI 183 

Query: 199 KIVTAAEAEAEKDRLHGVGIAQQRI<AIVD3LADSIQELI<DANVTLTEEQIMSILLTNQYL 258 

+ AE EAE L GVGIA+QR+AIVDGL DS+ + T + IM ++L QY 

Sbjct: 184 LQIKKAEGEAESKYI J AGVGIARQRQAIVD3LRDSVIAFSENVPGTTAKDIMD^^VTQYF 243 



Query: 259 DTLNTF-AINGNQTIFLPNNPEGVEDIRTQVLSAL 292 

DT+ A + + ++F+P+ P V+D+ Q+ L 
Sbjct: 244 DTMREIGASSKSSSVFIPHGPGAVKDVSAQIRDGL 278 



A related DNA sequence was identified in S.pyogems <SEQ ID 581 1> which encodes the amino acid 
sequence <SEQ ID 5812>. Analysis of this protein sequence reveals the following: 

50 Possible site: 32 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.06 Transmembrane 5 - 21 ( 1-29) 

Final Results 

55 bacterial membrane --- Certainty=0 .6222 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAF68390 GB:AF236374 hypersensitive-induced response protein 
, [Zea mays] 

Identities = 126/273 (46%) , Positives = 174/273 (63%) , Gaps = 3/273 (1%) 

Query: 23 LYWRQQSVAIVERFGRYQKTATSGIHIRLPFGI -DKIAARVQLRLLQSEI IVETKTKDN 81 

L V Q +VAI E FG++ + G H LP+ I +IA + LR+ Q ++ ETKTKDN 
Sbjct: 7 LVQVDQSTVAIKENFGKFSEVLEPGCHF-LPWCIGQQIAGYLSLRVRQLDVRCETKTKDN 65 

Query: 82 VFVTLlWATQYRvNEQNvTDAYYKLMKPESQIKSYIEDALRSSVPKLTLDELFEKKDEIA 141 

VFVT+ + QYR +DA+YKL QI+SY+ D +R++VPKL LD+ FE+K+EIA 

Sbjct: 66 VFVTWASVQYRAI^KASDAEYKLSHTREQIQSYVFDVIRATVPKLGLDDAFEQKNEIA 125 

Query: 142 LEVQHQVAEEMSTYGYIIVKTLITKVEPDAEVICQSMNEINAAQRKRVAAQELANADKIKI 201 

V+ ++ + MSTYGY IV+TLI +EPD VK++MNEINAA R RVAA E A A+KI 
Sbjct: 126 KAVEEELEKJMSTYGYQIVQTLIvDIEPDDRVKRAMNEIMAAARMRVAASEKAEAEECtLQ 185 

Query: 202 VTAAEAEAEKDRLHGVGIAQQRKA1 VDGLAESIQELKEANISLNEEQIMSILLTNQYIjDT 261 

+ AE EAE L GVGIA+QR+MVDGL +S+ E + 1M ++L QY DT 

Sbjct: 186 IKKAEGEAESKYIAGVGIARQRQAlv^lGLRDSVLAFSENVPGTTAKDIMD^WLVTQYFDT 245 

Query: 262 LNTFA^G-NQTLFLPNTPSGVEDIRTQVLSAL 293 

+ A + ++F+P+ P V4D+ Q-i- L 
Sbjct: 246 MREIGASSKSSSVFIPHGPGAVKDVSAQIRDGL 278 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 254/291 (87%) , Positives = 278/291 (95%) 

IILWILVLVIVLLITSLYVVICQQTVAIIERFGKYQKTATSGIHIRVPIjGIDKIAARVQL 64 
I + +++++ ++ + -t-LYW+QQ+VAI +ERFG+YQKTATSGIHIR+ P GIDKIAARVQL 
IFIAFGVIVILAIVASTLYWRQQSVAIVERFGRYQKTATSGIHIRLPFGIDKIAARVQL 65 

RLLQSEIIVETKTKDNVFVTLNIATQYRVlSnilWR^ 124 
RLLQSEIIVETKTKDNVFVTLN+ATQYRVNE NVTDAYYKL+KPE+QIKSYIEDALRSSV 



Query: 


5 


Sb j ct : 


6 


Query: 


65 


Sbjct: 


66 




125 


Sbjct: 


126 




185 


Sbjct: 


186 




245 


Sbjct: 


246 



PKLTLDELFEKKDEIALEVQHQVAEEMSTYGYIIVKTLITKVEPDAEVKQSMKEINAAQR 
PKLTLDELFEKKDEIALEVQHQVAEEMSTYGYIIVKTLITKVEPDAEVKQSMNEINAAQR 185 

KRVAAQELANADKI KI VTAAEAEAEKDRLHGVGIAQQRKAI VDGLADS I QELKDANVTLT 244 
KRVAAQELAmDKIKIVTAAEAEAEKDRLHGVGIAQQRKAIVDGLA+SIQELK+AN++L 
KRVAAQELANADKIKIVTAAEAEAEKDRLHGVGIAQQRKAIVDGLAESIQELKEANISLN 245 

EEQIMSILLTNQYLDTLNTFAINGNQTIFLPKNPEGVEDIRTQVLSALKTR 295 
EEQIMSILLTNQYLDTUNTFA GNQT+FLPN P GVEDIRTQVLSALKT+ 
EEQIMSILLTNQYLDTLNTFAAKGNQTLFLPNTPSGVEDIRTQVLSALKTK 296 

SEQ ID 5810 (GBS231) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 55 (lane 7; MW 60.9kDa). 

GBS231d was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 155 (lane 5-7; MW 59kDa) and in Figure 239 (lane 11; MW 59kDa). It was also expressed 
in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 155 (lane 9; 
MW 34kDa) and in Figure 183 (lane 6; MW 34kDa). Purified GBS231d-GST is shown in Figure 246, lane 
8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1869 

A DNA sequence (GBSxl977) was identified in S.agalactiae <SEQ ID 5813> which encodes the amino 
acid sequence <SEQ ID 5814>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2305 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9291> which encodes amino acid sequence <SEQ ID 9292> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

iGP:CAB13457 GB:Z99112 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 259/514 (50%), Positives - 350/514 (67%), Gaps = 9/514 (1%) 



Query: 


1 


Sbjct: 


46 






Sbjct: 


106 




121 


Sbjct: 


166 


Query: 


181 


Sbjct: 


223 




241 


Sbjct: 


280 




298 


Sbj ct: 


340 


Query: 


358 


Sbjct: 


400 


Query: 


418 


Sbjct: 


460 




478 


Sbjct: 


520 



M ++M 4GA+EV +G4VG LSKG+LMGARGNSGVI SQLFRGF K4E+ 



• +L +TP++LPVLKEVGWDSGG+GL+ +YEGFL-+L GE + KA ++ +MV+ 



v TEDI++G+CTEVMV L Q 



EIVKvHVHTEDPGLVMQEGLKYGSLVKVKVENMRNQHDA- - -QMQKVEVEETVKETKEYG 297 

+ KVH+H E+PG V+ YG L+K+K+ENMR QH + Q K ET + YG 

SIAKVHIHAEEPGNVINYAQHYGELIKIKIENMREQHTSIISQESKPADNETPPAKQPYG 339 



DT IDG +1 + D +G+++G 1+ ++ + A K +MI ED EIVTI GED Q 



-L E YE++EVEIH G QP+Y Y++S E 
? LSEKYEEIEVEIHNGKQPLYSYIVSAE 553 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5635> which encodes the amino acid 
sequence <SEQ ID 5636>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — - Certainty=0. 1816 (Affirmative) < suco 
bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside --- Certainty=0 .0000 (Not Clear) < suoo 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 434/511 (84%) , Positives = 475/511 (92%) 

Query: 1 MGMTMENC3AKEVSDKPATTVGEVGQILSKGVLMQARGNSC3VITSQLFRGFGQSIKDKEEL 60 

M MTM+NGAKEV+DKPA+TVGEVGQ+LSKG+LMGARGNSGVITSQLFRGFGQSIK K+EL 
Sbjct: 44 MSMTMDNGAKEVADKPASTVGEVGQMLSKGLLMGARGNSGVITSQLFRGFGQSIKGKDEL 103 

Query: 61 TGQDIAHAFQNG VEVAYKAVMKPVEGT I LTVSRGAATAALKKAEETDDAVEVMRA.TLKGA 120 

TG+DLA AFQ GVEVAYKAVMKPVEGTILTVSRGAATAALKKA+ TDDAVEVM+A L GA 
Sbjct: 104 TGKDLAQAFQVGVEVAYKAVMKPVEGTILTVSRGAATAALKKADLTDDAVEVMQAALDGA 163 

Query: 121 KRALAKTPDMLPVLKEVGVVDSGGQGLVFIYEGFLSALTGEYIASEDFKATPATMTEMVN 180 

K ALAKTPD+LPVLKEVGWDSGGQGLVFIYEGFLSAL G+Y+ S DFKATPA M+EM+N 
Sbjct: 164 KGALAKTPDLLPVLKEVGWDSGGQGLVFIYEGFLSALNGDYVTSADFKATPANMSEMIN 223 

Query: 181 AEHHKAWGHVATEDIKYGYCTEVMVGLKQGPTYViaiFNYEEFQGYLSNLGDSLLVVNDD 240 

AEHHK+ WGHVATED I YGYCTE+MV LKQ3PTYVKEFNY+EFQGYLS LGDSLLWNDD 
Sbjct: 224 AEHHKSWGHVATEDITYGYCTEIMVALKQGPTYVKEFNYDEFQGYLSGLGDSLLWNDD 283 

Query: 241 EIVKVHVHTEDPGLWQEGLKYGSLVKVKVETMRNQHDAQMQKVEVEETVKETKEYGIIA 300 

EIVKVHVHTEDPGLVMQEGLKYGSL+K4-KV+NMRNQH+AQ+QK +VE+ E K++G+IA 
Sbjct: 284 EIVKA7HVHTEDPGLVMQEGLKYGSLIKIKVDNMRNQHEAQVQKTDVEKNKAEVKDFGLIA 343 

Query: 301 



Query: 361 S 

AA+WDIPAAW TRTVPQGFTSLLAFDP+KSLE NVADM+ SLSDV+ SGS VTLAVRDTT 
Sbjct: 404 AMVVDIPAAWATRWPCK3FTSLIAFDPSKSLEDNVADMSTSLSDWSGSVTIAVRDTT 463 

Query: 421 IDGLEIHE^ILGMVDGKILVSTPD^KAI^TFDKMIDEDSEIOTIYVGEDGKQaiAET 480 

IDGLEIHEND LGMVDGKI+VS PDME LK F+KMIDEDSEIVTI+VGE+G Q LAE 
Sbjct: 464 IDGDEIHENDFLGMVDGKI IVSNPDMEATLKAAFEKMIDEDSEIVTI FVGEEGDQDLAEE 523 

Query: 481 LSEYLEETYEDVEVEIHQGDQPVYPYLMSVE 511 

L+ YL ETYEDVEVEIHQGDQPVYPYLMSVE 
Sbjct: 524 IiAGYLGETYEDVEVEIHQGDQPVYPYLMSVE 554 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1870 

A DNA sequence (GBSxl978) was identified in S.agalactiae <SEQ ID 5815> which encodes the amino 
acid sequence <SEQ ID 5816>. Analysis of this protein sequence reveals the following: 

N-tertninal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 4771 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1871 

A DNA sequence (GBSxl979) was identified in S.agalactiae <SEQ ID 5817> which encodes the amino 
acid sequence <SEQ ID 5818>. This protein is predicted to be proliferating-cell nucleolar antigen P120. 
Analysis of this protein sequence reveals the following: 



■ Pinal Results 

bacterial cytoplasm Certainty=0. 3774 (Affirmative) 

bacterial membrane Certainty=0. 0000 (Not Clear) < i 

bacterial outside Certainty=0 . 0000 (Not Clear) < : 



A related GBS nucleic acid sequence <SEQ ID 9345> which encodes amino acid sequence <SEQ ID 9346> 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC74905 GB:AE000278 putative nucleolar proteins [Escherichia 
coli K12] 

Identities = 87/229 (37%) , Positives = 128/229 (54%) , Gaps = 8/229 (3%) 

ITTGLVYSQEPAAQ- - IVAQIAEPQEGMKVLDLAAAPGGKTTHLLSYLNNTGLLV 12 
I +GL Y QE ++ + A A+ +V+D+AAAPG KTT + + +NN G ++ 



Query: 


63 


Sbjct: 


89 


Query: 




Sbjct: 


149 




181 


Sbjct: 


209 


Query: 


240 




269 



+NE S R K+L N+ R G NV +T+ 



L GG LVYSTCT + EENE V 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5819> which encodes the amino acid 
sequence <SEQ ID 5820>. Analysis of this protein sequence reveals the following: 

^-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0.2316 (Affii 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 213/311 (68%), Positives = 254/311 (81%), Gaps = 3/311 (0%) 

Query: 1 MKLPNEFIEKYQTILIODEAEAFFDSFEQKPISAYRTNPLKEKQLDFPNAIPSTPWGHYGK 60 

M LP EFI YQ IL E E F SF Q+P++A+R NPLK + F + IP+T WG+YGK 
Sbjct: 2 MSLPKEFINTYQAILGKELEDFIASFNQEPVNAFRINPLKNQLKTFEHPIPNTLWGYYGK 61 

Query: 61 ISGKSIEHTTGLVYSQEPAAQIVAQIAEPQEGMKVLDLAAAPG<3KTTHIjLSYLNNTGLLV 120 

+SGKS EH +GLVYSQEPAAQ+VAQ+A PQ+G +VLDLAAAPGGK+THU.+YL4NTGLLV 
Sbjct: 62 LSGKSPEHVSGLVYSQEPAaQIWAQVAAPQKGSH'i/LDLAAAPGGKSTHLLAYLDNTGLLV 121 
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Query: 181 PQAIQYWHKDYPTECaQLQRDILKEaiKMLAHGGILTCSTCTWSPEENEEVVNWLLQEYD 240 

P AIQYWH YP ECA+LQ+ IL++A+ ML GG L+YSTCTW+PEENE+W WLL+ Y 
Sbjct: 182 PDAIQYWHHGYPAECAKLQICSILEDALA^KPGGELIYSTCTWAPEENEDWQWLLETYT 241 

Query: 241 YLELVD I PKLNGMVEGINVPQVARMYPHHFQGEGQFVAKLRDTRS KEAQK1 KPKAQKIM- 299 

+LELVD+PKLNGMV GI +P+ ARMYPH +QGEGQFVAKL+D R +E Q K KA K N 
Sbjct: 242 FLELVDVPKLNGWSGIGLPETARI'TiTHRYQGEGQ^/AKLKDKR-QEGQSTKLKAPKSNL 300 

Query: 300 -KMQLQLWQQF 309 

K QL+LW+ F 
Sbjct: 301 IKDQLRLWKMF 311 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 



Example 1872 

A DNA sequence (GBSxl980) was identified in S.agalactiae <SEQ ID 5821> which encodes the amino 
acid sequence <SEQ ID 5822>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 .4111 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 6 

Sbjct: 9 

Query: 66 

Sbjct: 69 EEGQGDKIHS--LEGWWIIDPIDGTMNFVHQQKNFAISIGIFENGEGKIGLIYDWHDE 126 

Query: 123 LYSGGGHFDVYANDKKIVPFQECPLERCLLGWSAMYAEN DCGIAHLASETLGVRI 178 

LY Y N+ K+ P +E +E +L +N+ EN +A L G R 

Sbjct: 127 LYHAFSGRGAYMNETKLAPLKETVIEEAIIAINATWn?ENRRIDQSVLAPLVKRVRGTRS 186 

Query: 179 YGGAG1SMAKVMQGKLLAYFSY-IQPWDYAAAKIMGETLGFTLLTLDGEEPNYSTRQKVM 237 

YG A + +A V G++ AY + + PWDYAA ++ +G T T++GE + V+ 
Sbjct: 187 YGSAALELANVAAGRIDAYITMRIoAPVTOYAAGCVIjIiNEVGGTyTTIEGEPFTFLENHSvIi 246 



A related GBS nucleic acid sequence <SEQ ID 10937> which encodes amino acid sequence <SEQ ID 
10938> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5823> which encodes the amino acid 
50 sequence <SEQ ID 5824>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 1843 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 155/253 (61%) , Positives = 205/253 (80%) 

Query: 1 MDAKFDFAKQLVYKAGQFIKSEKQNTFDTO3KSRFDDL\TSLDKKTQKLLIQEIIQHYPD 60 

++ K+ FA+Q++ +AG FIKS+M D++ K++FDDLVT++D++TQ+LL+ I Q YP 
Sbjct: 8 LETKYAFARQIIKEAGLFIKSKMSEQLDIQVKTQFDDLVTNVDQETQQLLMDRIHQTYPC 67 

Query: 61 DNIJIAEEDBVRSPIAQGIWWVLDPIDGrVNFIVQKDNFAVMLAYYEEGVGQFGIIYDVMA 120 

D ILAEE++VR PI QGNVWV+DPIDGTWFIVC FAVM+AYYE+G+GQFG+IYDVMA 
Sbjct: 68 DAIIAEENDVRHPINQGNVWVIDPIDGTVNFIVQGSQFAVMIAYYECGIGQFGLIYDVMA 127 

Query: 121 DILYSGGGHFDVYANDKKIVPFQECPLERCI.LGVNSAMYAENDCGIAHLASETLGVRIYG 180 

D L +GGG F+V N K+ +QE PLER L+G N+ M+A ND +AHL ++TLGVR+YG 
Sbjct: 128 DQLIAGGGDFEOTI^GDKLPAYQEKPLERSLIGCNAGMFARNDRNLAHLIAKTLGVRVYG 187 

Query: 181 GAGISMAiOTMQGKLLAYFSYIQP^nDYAAAKIMGETLGFTLLTLDGEEPNYSTRQKVMFLP 240 

GAGI M KVM+ +LLAYFS+IQPWDYAAAK++G+ LG+ LLT+DG EP++ TRQK+MF+P 
Sbjct: 188 GAGIC^IVKVMKQELIAYFSFIQPWDYAAAXVLGDKLGYVLLTIDGYEPDFQTRQKIMFVP 247 

Query: 241 KSKLNLIQSYLTK 253 

K +L I S+LTK 
Sbjct: 248 KCQLTRIASFLTK 260 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1873 

A DNA sequence (GBSxl981) was identified in S.agalactiae <SEQ ID 5825> which encodes the amino 
acid sequence <SEQ ID 5826>. Analysis of this protein sequence reveals the following: 
Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4131 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certair.ty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC24938 ,GB:AF012285 unknown [Bacillus subtilis] 
Identities = 33/78 (42%) , Positives = 50/78 (63%) 

^ Y YP++ W TE+ V+ F QVE AYE* ++LL +Y+ FK++V KA+EK++ E 

Sbjct: 3 YQYPMNEDWTTEEAVDVIAFFQQVELAY3KGADREELLKAYRRFECEIVPGKAEEKKLCGE 62 

Query: 73 FQRTSGYSTYQAVKAAQQ 90 

F+ S YS Y+ VK A++ 
Sbjct: 63 FEEQSTYSPYRTVKQARE 80 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5827> which encodes the amino acid 
sequence <SEQ ID 5828>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 4442 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 59/91 (64%) , Positives = 70/91 (76%) 



10 



Query: 9 ISS1WSYPLDPSWOTEDITKVLRFLNQVEHAYENSIKVDDLLDSYKEFKKVVKSKAQEKQ 68 

+S NY YPLD SW+TE+I+ VL FLN+VE AYE + LLDSYK +K +VKSKAQEKQ 
Sbjct: 5 MSGNYYYPLDLSWSTEEISSVLHFLNKVrELAYEKKVDAKQLLDSYKTYKTIVKSKAQEKQ 64 

Query: 69 IDREFQRTS6YSTYQAVKAAQQQAKGFISLG 99 

IDR+FQ+ SGYSTYQ VK A+ KGF SLG 
Sbjct: 65 IDRDFQKVSGYSTYQWKKAKAIEKGFFSLG 95 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1874 

A DNA sequence (GBSxl982) was identified in S.agalactiae <SEQ ID 5829> which encodes the amino 
15 acid sequence <SEQ ID 5830>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have no N-terminal signal sequence (or aa 1-18) 

. Final Results 

20 bacterial cytoplasm Certainty=D . 0952 (Affirmative) < suco 

bacterial membrane Certainty=D . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

25 >GP:AAF21893 GB:AF103794 unknown [Listeria monocytogenes] 

Identities = 74/126 (58%) , Positives = 101/126 (79%) 

Query: 1 MITLFLSPSCTSCRKARAWLSKHEVAFEEHNI1TSPLNKEELLQILSFTE1NGTEDIISTR 60 
M+TL+ SPSCTSCRK+RAWL +H++ ++E NI + PL+ +E+ +IL TE+GT++I ISTR 
30 Sbjct: 1 MVTLYTSPSCTSCRKSRAWLEEHDIPYICERNIFSEPLSLDEIKEILRMTEDGTDEIISTR 60 



Query: 61 SKVFQKIAIDVDELSTSSLMELISENPSLIiRR.pl ILDKKRMQIGFNEDEIRAFLPRDYRK 120 

SK FQKL +D+D L L ELI +NP LLRRPII+D+KR+Q+G+NEDEIR FLPR R 
Sbjct: 61 SKTFQKLNVDLDSLPLQQLFELIQKNPGLLRRPIIIDEKRLQVGYNEDEIRRFLPRRVRT 120 

35 

Query: 121 QELKQA 126 
Sbjct: 121 YQLREA 126 

40 A related DNA sequence was identified in S.pyogenes <SEQ ID 583 1> which encodes the amino acid 
sequence <SEQ ID 5832>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 0511 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



50 An alignment of the GAS and GBS proteins is shown below. 

Identities = 112/134 (83%) , Positives = 127/134 (94%) 

Query: 1 MITLFLSPSCTSCRKARAWLSKHEVAFEEHNIITSPLNKEELLQILSFTENGTEDIISTR 60 
M+TLFLSPSCTSCRKARAWL KHEV F+EHNIITSPL+++EL+ ILSFTENGTEDI ISTR 
55 Sbjct: 1 MVTLFLSPSCTSCRKARAWLVKHEVDFQEHNIITSPLSRDELMSILSFTENGTEDIISTR 60 

Query: 61 SKVFQKIAIDVDELSTSSLMELI SENPSLLRRP 1 1 LDKKRMQIGFNEDEIRAFLPRDYRK 120 

SKVFQKL IDV+ELS S L++LI++NPSLLRRPII+D+KRMQIGFNEDEIRAFL RDYRK 
Sbjct: 61 SKVFQKLDIDVEELSISDLIDLIAKNPSLLRRPIIMDQKRMQIGFNEDEIRAFLSRDYRK 120 
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Query: 121 QELKQATIRAEIEG 134 

QEL+QATI+AEIEG 
Sbjct: 121 QELRQATIKAEIEG 134 

SEQ ID 5830 (GBS232) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 10; MW 16.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 56 (lane 2; MW 42kDa). 

GBS232-GST was purified as shown in Figure 207, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1875 

A DNA sequence (GBSxl983) was identified in S.agalactiae <SEQ ID 5833> which encodes the amino 
acid sequence <SEQ ID 5834>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) ■ 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) ■ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 5835> which encodes the amino acid 
sequence <SEQ ID 5836>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 1768 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 210/308 (68%) , Positives = 252/308 (81%) 

MKIHYINDYKDIQAKEDC\TjVLGYFD3LHLGHKALFDKA1CKIATEKNLKIVVLTFNETPR 6 0 
M+I YI DY+DI ++D VL+LGYFDGLH GHKALFDKA+++A ++ LK+W TF E+P+ 
MEIEYIKDYRDINQEDDTvniLGYFDG^RGEKALFDKAREVANKEGLKVWFTFTESPK 6 0 

LTFARFQPELLLHLTSPEKRSEKFQEYGVDELYLMNFTSHFSKVSSDLFIKKYIYGLRAK 12 0 
L F+RF PELLLH+T P+KR EKF +YGV++LYL++FTS FSKVSSD FI YI L+AK 



WGFDYKFGHNRT DYL RNF+G VY I+EI E KIS+T 1 



LLGY+ ST G WHGDARGRTTGFPTAN1API+ TYLPADGVY++NV++ K YR+MTS+ 



GKN+TFGG ELRLE NIFDFD +IYGE IEI WL +IR+M KF GI+DL +L+ DK A 



Query: 




Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1876 

A DNA sequence (GBSxl984) was identified in S.agalactiae <SEQ ID 5837> which encodes the amino 
acid sequence <SEQ ID 5838>. This protein is predicted to be tRNA pseudouridine 5S synthase (truB). 
, Analysis of this protein sequence reveals the following: 

10 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2576 (Affirmative) < suco 

15 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9817> which encodes amino acid sequence <SEQ ID 9818> 
was also identified. 

20 The protein has homology with the following sequences in the GENPEPT database. 



Query: 2 ITGIINLKKEAGMTSIIDAVFKLRKILIITKKIGHGGTLDPDWGVLPIAVGKATRVIEYMT 51 

+TGI+ L K GMTSHD V KLR++L TKK+GH GTLDPDV GVLP+ +G AT+V +YM+ 
Sbjct: 3 MTGILPIAKPRGMTSHDCVAKLRRLLKTKKVGHTGTLDPDVYGVLPVCIGHATKVAQYMS 52 

Query: 62 ESGKIYEGEITLGYATSTEDSSGEVISRTPLTQSDLSEDWDHAMKSFTGPITQVPPMYS 121 

+ K YEGE+T+G++T+TED SG+ + T Q E WD + +F G I Q+PPMYS 
Sbjct: 63 DYPKAYEGEVTVGFSTTTEDRSGDTVE-TKTIQQPFVmWDQVLATFVGEIKQIPPMYS 121 

Query: 122 AVKVNGKKLYEYARSGEEVERPKRQITISEFRRTSPLYFEKGICRFSFYVSCSKGTYVRT 181 

AVKV GK+LYEYAR+G VERP+R +TI R S + +E+G+CRF F VSCSKGTYVRT 
Sbjct: 122 AVKVRGKRLYEYARAGITVERPERTVTIFSLERMSDIVYEEGVCRFRFNVSCSKGTYVRT 181 

Query: 182 LAVDLGIKLGYASHMSFLKRTSSAGLSITQSLTLEEINEKYKQ-EDFSFLLPIEYGVLDL 240 

LAVD+G LGY +HMS L RT S S+ + T E+ E+ +Q E S LLPIE +LD+ 
Sbjct: 182 LAVDIGKALGYPAHMSDLVRTKSGPFSLEECFTFTEL3ERLEQGEGSSLLLPIETAILDI 241 

Query: 241 PKVNLTEEDKVEISYGR RILLENEADTLAAFYE 273 

P+V + +E + +1 +G R + NE h A Y+ 

Sbjct: 242 PRVQVNKEIEEKIRHGAVLPQKWFNHFRFTVYNEEGALLAIYK 284 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5839> which encodes the amino acid 
sequence <SEQ ID 5840>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2698 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/295 (68%), Positives = 246/295 (83%), Gaps = 2/295 (0%) 
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Query: 1 MITGIINLKKEAGMTSHDAVFKLRKILHTKKIGHGGTLDPDWGVLPIAVGKATRVIEYM 50 

MI GIINLKKEAGMTSHDAVFKLRK+L KKIGHGGTLDPDWGVLPIAVGKATRVIEYM 
SbjCt: 1 MINGIINLKICEAGMTSHDAVFKIjRKLLQEKKIGHGGTLDPDWGVLPIAVGKATRVIEYM 60 

Query: SI TESGKIYEGEITLGYATSTEDSSGEVISRTPLTQSDLSEDWDHAMKSFTGPITQVPPMY 120 

TE+GK+YEG++TLGY+T+TED+SGEV++R+ L +■ L+E++VD M +F G ITQ PPMY 
SbjCt: 61 TEAGKVYEGQVTLGYSTTTEDASGEWARSSL-PAVLTEELVDQTMTTFLGKITQTPPMY 119 

Query: 121 SAVKVNGKKLYEYARSGEEVERPKRQITISEFRRTSPLYF-EKGICRFSFYVSCSKGTYV 179 

SAVKVNG+KLYEYAR+GE VERP+R++TIS F RTSPL F E G+CRFSF V+CSKGTYV 
SbjCt: 120 SAVKOTJGRKLYEYARAGESVERPRREVTISLFERTSPLNFTEDGLCRFSFKVACSKGTYV 179 

Query: 180 RTLAVDLGIKLGYASHMSFLKRTSSAGLSITQSLTLEE1NEKYKQEDFSFLLPIEYGVLD 239 

RTLAVDLG LG SHMSFL+R++SAGL++ + TL EI + +++ SFLLPIEYGV D 
Sbjct: 180 RTLAVDLGRALGVESHMSFLQRSASAGLTLETAYTLGEIADMVSKQEMSFLLPIEYGVAD 239 

Query: 240 LPKVNLTEEDKVE I S YGRR I LJjENEADTIAAF YENRVI AI LEKRGNEFKPHKVLL 294 

LPK-t- + + + EIS+GRR+- L ++ IAAF+ +VIAILEKR E+KP KVL+ 
Sbjct: 24 0 LPKMVIDDTELTEISFGRRLSLPSQEPLIiAAFHGEKVIAILEKRDQEYKPKKVLI 294 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1877 

A DNA sequence (GBSxl985) was identified in S.agalactiae <SEQ ID 5841> which encodes the amino 
acid sequence <SEQ ID 5842>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2776 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) s suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9819> which encodes amino acid sequence <SEQ ID 9820> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12871 GB:Z99109 similar to hypothetical proteins [Bacillus eubtilis] 
Identities = 3S/145 (26%) , Positives = 68/145 (46%) , Gaps = 7/145 (4%) 

Query: 3 MKIRTATLDDSEKLVPLYQELG YAISLSEIQSILKVILTHSDYGFLIAEDNGKLLA 58 

M IR A D+ + PL+ + A L ++ LK L + + LIAE+NG+ + 

Sbjct: 1 miRCAKTSDAAAIAPLFNQYREFYRQASDLQGAEAFIiKRRLENHESVILIAEENGEFIG 60 

Query: 59 FVGYHKLYFFEKSGTYYRirJ^VWEKHRRKGIASQLINHVKQLAKTDGSEVLALNSSLK 118 

F + + Y + h V R KG +L++ K A +G++ h L + 4 

Sbjct: 61 FTQLYPTFSSVSMECRIYILNDLFWPHARTKGAGGRLIjSAAKDYAGQNGAKCLTLQT--E 118 

Query: 119 EYRQEAYHFYENLGFKKVSTGFSYY 143 

+ ++A YE G+++ TGF +Y 
Sbjct: 119 HHNRKARSLYEQNGYEE - DTGFVHY 142 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5843> which encodes the amino acid 
sequence <SEQ ID 5844>. Analysis of this protein sequence reveals the following: 

I- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 0962 (Affirmative) < suco 
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bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 37/126 (29%), Positives = 64/126 (50%), Gaps = 16/126 (12%) 

Query: 18 PLYQE LGYAISLSEIQSILKVILTHSDYGFLIA--EDNGICLLAFVG YHKLYF 67 

P+ QE LGY +SL 4-+ + ++ + FL +D +LL +V Y LY 

Sbjct: 11 PMLQEINAKALGYLVSLDLLERQYERLIEDCHHYF^AYADKDTOQLLGYVHAERYETLY- 69 

Query: 68 FEKSGTYYRILALVVNEKHRRKGIASQLINHVKQIAKTDGSEVLAIMSSLKEYRQEAYHF 127 

+ +h L V ++R+GI S L+ ++ A+ +G + LNS+ +R+EA+ F 

Sbjct: 70 ASDGLNLLGLAVLPAYQRRGIGSALLRALESQARQEGIAFIRI1NSA--SHRKEAHAF 124 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1878 

A DNA sequence (GBSxl986) was identified in S.agalactiae <SEQ ID 5845> which encodes the amino 
acid sequence <SEQ ID 5846>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

• Final Results 

bacterial cytoplasm --- Certainty=0. 1659 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



RGD motif 28-30 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF30776 GB:AE002133 conserved hypothetical [Ureaplasma 
urealyticum] 

Identities = 106/440 (24%) , Positives = 206/440 (46%) , Gaps = 65/440 (14%) 

FAINESEYHQLLEQIRGDAFDKEVSERLEKERL I L3EQAKNQLQEWVE - KDKEIAKLQY 7 1 
F N+ +Y++L++Q +D ■ LEK+R L E+ KN+ + + KD + K 
FLANDRDYNELVKQ RYD LEKQRDELKEKLKNEGNKAIAHFKDSDEYKNLI 120 





13 


Sbjct: 




Query: 


72 


Sbjct: 


121 


Query: 


124 


Sbjct: 


181 




157 


Sbjct: 


241 


Query: 


217 


Sbjct: 




Query: 


270 


Sbjct: 


359 



--RDAIQNQLHIQ-- 



3 +E K K+ + K VGE LE + + +F++ + P+ F K N 
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Query: 330 ALMKEQNIDITHFEEDLDIFKNAFAKN-YNSAEKNFQKAIDEIDKSIKRMEAV-KAALTT 387 

+++ + D EE+LD K N + +K ID+ IK+ E++ ++A 

Sbjct: 413 QIIRYE--DRAKIEENLDELKKDIVDNTLKYINDKTKKIIDDSKAIIKKAESIEESAEDI 470 

Query: 388 SENQLRLANNKLDDVSVKKL 407 

+L K+++++++K+ 
Sbjct: 471 IKKKUSTTLKKKINELTIRKI 490 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5847> which encodes the amino acid 
sequence <SEQ ID 5848>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3192 (Affirmative) < suco 

bacterial membrane certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 310/445 (69%) , Positives = 352/445 (78%) , Gaps = 22/445 (4%) 

Query: 1 MMEIKCPHCGTAFAINESEYHQLLEQIRGDAFDKEVSERLEKERLILGEQAKNQLQEVW 60 

MNEIKCPHC T F INESEY QLLEQ+RG AFD+E+ +RL E +L E+AK+QL EW 
Sbjct: 1 MNEIKCPHCHTLFTINESEYSQLLEQVRGQAFDEELKKRLINEIALLEEKAKHQLHEWA 60 



N +LA QL IK 



Query: 99 Dl«MLEI)LENQIDRLRLEHENSLQEALTKvERERDATQNQLHIQEKEKDLALASVKSDYEV 158 

D + L NQ+D+L LE + + Q L +E+ERD I+NQL +Q KE +L+LASV+SDYE 
Sbjct: 121 DKEWSLTNQLDKLALEKDATFQSKIiATIEKERDGIKNQLALQAKESELSLASVRSDYEA 180 

Query: 159 QLKAANEQVEFYKNFKAQQSTKAVGESLEHYAETEFNKVRHLAFPNAYFEKDNTLSSRGS 21B 

QLKAANEQVEFYKNFKAQQSTKA+GESLE YAETEFNKVR AFPNA F KDN I.SSRGS 
Sbjct: 181 QLKAANEQVEFYKNFKAQQSTKAIGESLELYAETEFNKVRSYAFPNASFVItDNQLSSRGS 240 

Query: 219 KGDFIYREKDENDLEFLSIMFEMKNESDDTIKKHKNEDFFKELDKDRREKSCEYAVLVTM 27B 

KGD+IYRE D N +E LSIMFEMKNE+D T KHKN DFFKELDKDRREK CEYAVLV+M 
Sbjct: 241 KGDYIYREVDANGVEILSIMFEMKNEADTTKTICHKNSDFFKELDKDRREKDCEYAVLVSM 300 

Query: 279 LEAnNDYYNTGI TOVSHKYPKMYVIRPQFFIQLIGILRNAAIJWLKYKQELALMKEQNID 338 

LEADNDYYNTGIVDVSH+Y KMYV+RPQ FIQLIGILRNAALN+L YKQELAL+KEQNID 
Sbjct: 301 LEADNDYYNTG I VDVSHEYQKMYVVRPQL F I QL I G I LRNAALNSLHYKQELALVKEQNID 350 

Query: 339 ITHFEEDLDIFKNAFAKNYNSASKNFQKAIDEIDKSIKRMEAVKAALTTSENQLRLANNK 398 

ITHFEEDLD FKNAFAKNY SAS NF+KAIDEIDKSIKRME VK LTTSENQLRLANNK 
Sbjct: 361 ITHFEEDLDQFKHAFAKNYQSASJOTKKAIDEIDKSIKRMEEVKRFLTTSENQLRLANNK 420 

Query: 399 LDDVSVKKLTRKNPTMKAKFDALKD 423 

D+DVSVKKLTR+NPTM+ KF+ALKD 
Sbjct: 421 LEDVSVKKLTRQNPTMREKFEALKD 445 

SEQ ID 5846 (GBS304) was expressed in E.coli as a His-fusion product. The purified protein is shown in 
Figure 206, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1879 

A DNA sequence (GBSxl987) was identified in S.agalactiae <SEQ ID 5849> which encodes the amino 
acid sequence <SEQ ED 5850>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1845 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 585 1> which encodes the amino acid 
sequence <SEQ ID 5852>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=C . 2492 (Affirmative) < suco 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000(Not Clear) <: suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 113/180 (62%) , Positives = 141/180 (77%) 

Query: 16 LSELVDCFKGKAVPSKAEAGD1RIINLSDMSPLGIDYHNLRTFQDEQRSLLKYLLQEGDV 75 

L +VDCFKGKAV SK GD+ +1NLSDM LGI YH LRTFQ ++R LL+YLL++GDV 
Sbjct: 18 LGTVVDCFKGKAVSSKVVPGDVGLINLSDMGTLGIQYHQLRTFQMDRRQLLRYLLEDGDV 77 

Query: 76 LIASKGTVKKVAIFEEQDYPWASANITILRPTQHIRGYYLKLFFDSEEGQQALENANKG 135 

LIASKGT+KKV +F +Q+ WAS+NIT+LRP + +RGYY+K F DS GQ L+ A+ G 
Sbjct: 78 LIASKGTLKKVCVFHKQNRDWASSNITVLRPQKLLRGYYIKFFLDSPIGQALLDVADHG 137 

Query: 136 KAVMNISTKELLNIAIPSIPLFRQDYLIQRYKQGLKDYKRKIARAEQEWERIQNDIRQQL 195 

K V+N+STKELL+I IP IPL +QDYLI Y +GL DY RK+ RAEQEWE IQN+I++ L 
Sbjct: 138 KDVINLSTKELLDIPIPVIPLVKQDYLXNHYLRGLTDYHRKLNRAEQEWEYIQNEIQKGL 197 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1880 

A DNA sequence (GBSxl988) was identified in S.agalactiae <SEQ ID 5853> which encodes the amino 
acid sequence <SEQ ID 5854>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.43 Transmembrane 62 - 78 ( 55 - 82) 
INTEGRAL Likelihood = -2.87 Transmembrane 130 - 146 ( 130 - 150) 
INTEGRAL Likelihood = -1.28 Transmembrane 37 - 53 ( 37 - 53) 

Final Results 

bacterial membrane Certainty=0. 3972 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9347> which encodes amino acid sequence <SEQ ID 9348> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA22372 GB:AL034446 putative transmembrane protein 
5 [Streptomyces coelicolor A3 (2) ] 

Identities = 38/139 (27%), Positives = 64/139 (45%), Gaps = 5/139 (3%) 

Query: 15 SASvEILCRGWLLPVSATKYSKIVSVSISSIFFGLLHSANNHVSLISIFNLCL-FGLFLS 73 
+A+ E++ RG L + +++ ++ + FGL+H N +L + + G L+ 

10 Sbjct: 143 AATEEWFRGVLFRIIEEHIGTYlALGLTGLVFGI^lHLL}ffiDATLWGALAIAIFAGFMLA 202 

Query: 74 LYVILKGNIWGACGIHGAWNCVQGSVFGIEVSGEPMLSNSLVHVKTYGADWISGGKFGVE 133 

N+W G+H WN G W VSG S L+ G ++GG FG E 

Sbjct: 203 AAYAATRNLWLTIGVHFGWNFAAGGVFSTWSGNGD-SEGLLDATMSGPKLLTGGDFGPK 261 

15 

Query: 134 GSMIT SIVLIVACYWL 149 

GS+ + ++L + WL 
Sbjct: 2S2 GSVYSVGFGVLLTLVFLWL 280 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1881 

A DNA sequence (GBSxl989) was identified in S.agalactiae <SEQ ID 5855> which encodes the amino 
25 acid sequence <SEQ ID 5856>, which is a methylase gene homolog. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2192 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

RGD motif: 264-266 

A related GBS nucleic acid sequence <SEQ ID 9929> which encodes amino acid sequence <SEQ ID 9930> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA87672 GB:AB016260 Hypothetical gene, methylase gene homolog 
[Agrobacterium tumefaciens] 
Identities = 358/1238 (28%) , Positives = 595/1238 (47%) , Gaps = 99/1238 (7%) 

Query: 1072 KEVARIKGMTOIRNAYQEVIAIQRYYDYDKETFNHLI^KLNRTYDSFVKHYGYENSAV-- 1129 

K V 1+ ++ IR+A +EV+ Q + L +L + SFV+ +G +N 

Sbjct: 497 KHVRIIRKLIPIRDAVREVLKAQEL DRPWKDLQVRLRVAWSSFVRDFGPINHTTVS 552 

Query: 1130 NRNLFDSDDKYSLLASLEDESL--DPSGKSVIYTKSLAFEKAL 1170 

N F D L+AS+ED L D + I+T E+ + 

Sbjct: 553 ITEDPESGETRESHRRPNLQPFADDPDCWLVASIEDYDLENDTAKPGAIFT ERVI 607 

Query: 1171 WPEKEVKKVHTALDAI^SSIiaDGRGVDFAYMMSIYQVESQMTLIEELGDLIMPDPEKYL 1230 

P V + +A DAL L + VD ++ + + ++ ELG I DP 
Sbjct: 608 SPPAPPV- -ITSAADALAWLNERGRVDLDHIAELLHRDPD-DWAELGSAIFRDP 660 
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Query: 1231 NGELTYVSRQDFLSGDWTKLEV^TDLFi/KQDNQDFKWSHYAGLLEAIKPARITLADIDYR 1290 

+ ++ +LSG V KL+V + D ++ L ++P + +DI R 

Sbjct: 661 -ADGSWQMADAYLSGPVRDKLKVAEAAAALDPV YNRNVTALAGVQPVDLRPSDITAR 716 

Query: 1291 IGSRWIPLAVYGKFAQETFMGKAYELSDQ-EVATVLEVSPIDGVITyQSKFAYTYSNATD 1349 

+G4- WIP A F +E MG 4- E+A+ + G + AT TD 

Sbjct: 717 LGAPWI PAAD WAFVKE - MMGTD I RIHHMPELASWTVEARQLGYLA AGTSEWGTD 770 

Query: 1350 RSLGVPASRYDSGRKIFE^IMSNQPTITKQVVEGDKKKWrDVEKTTVLRAKETHLQEL 1409 

R ++ + LNS PI + +GD ++ V +V T + K +++ 

Sbjct: 771 RR HAGELLSDALNSRVPQIFDTIFJ3GDSERRVLNVVDTEAAKEKLHKIKDA 821 

Query: 1410 FQGFVAKYPEVQQMIEDTYNRLYNRTVSKSYDGSHLTIDGLAQNISLRPHQKNAIQRIVE 1469 

FQ ++ P+ + YN +N + + G HL + G + L HQK I RI + 
Sbjct: 822 FQRWIWSDPDRTDRLARVYNDRFNNIAPRKFSGDHLNLPGASGAFVLYGHQKRGIWRIIS 881 

Query: 1470 EimALlAHEVGSGKTLTMLGAGFKLKEmMUHKPLYWPSSLTAQFGQEIMKFFPTKKVY 1529 

LAH VG+GKT+TM + + + LG++ K + WP AQ +E + +PT ++ 
Sbjct: 882 SGSTYIAHAVGAGKTMTMAASIMEQRRLGLIAKAMQWPGHCLAQAAREFLALYPTARIL 941 

Query: 1S30 VTTKKDFAKAKRKQFVSRI ITGDYDAI VIGDSQFEKI PMSREKQVTYINDKLEQLREI KL 1589 

V + +F+K KR +F+SR T +DAI+I S F I + + I+D+LE + L 
Sbjct: 942 VADETOFSKDKRARFLSRAATATWDAIIITHSAFRFIGVPARFESQMIHDELELYETLLL 1001 

Query: 1590 GSDSDYT\'--KEAERSIKGLEHQLEELQKLERDTFIEFENLGIDFLFVDEAHHFKNIRPI 1647 

+ + V K ER +GL+ +LE L +D + +G+D + VDEA F+ + 
Sbjct: 1002 KVEDEDEVSRKRLERLKEGLQERLEALST-RKDDLLTIAEIGVDQIIVDEAQEFRKLSFA 1060 

Query: 1648 TGLGOTAGITNTTSKKNVDMEMKTOQVQAEHGDRIvTWFATGTPVSNSISELFTMMDYIQP 1707 

T + + G+ S++ D+ +K R +4 + R +V A+GTP++N++ E+F++ + 
Sbjct: 1061 TNMSTLKGVDPNGSQRAWDLYVKSRFIETINPGRALVLASGTPITNTLGEMFSVQRLMGH 1120 

Query: 1708 DVLERYLVSNFDSWVGAFGNIENSMELAPTGDKYQPKKRFKKFVNLPELMRIYKETADI- 1766 

LE + FD+W FG+ +EL P+G KY+P RF FVN+PEL+ +++ AD+ 
Sbjct: 1121 AALEERGLHEFDAWASTFGDTTTELELQPSG-KYKPVSRFASFVNVPELIAMFRSFADW 1179 

Query: 1767 ---QTSDMLDLP-VPEAKIIAVESELTQAQKYYLEELVKRSDAIICSGS--VDPSRDNMLK 1820 

+ + + p + + V S+ TQA K++ h +R AI+ P D +L 

Sbjct: 1180 MPADLREYVK7PAISTGRRQIVTSKPTQAFKHHQMVLAERIKAIEERERPPQPGDDILLS 1239 

Query: 1821 ITGEARKLAIDMRLIDPTYSLSDNQKILQWDNVERIYRDGAGDK AT 1867 

+ + R AID+RL+D + K+ +V N RI++ AG A 

Sbjct: 1240 VITDGRHAAIDI^LVDADNDNEPDNKLKIJLVSNAFRIWI^ATAGSVYLRHDSKPFEVPGAA 1299 

Query: 1868 QMIFSDIGTPK-SKEEGFDVYNELI<ELFVDRGIPKEEIAFVHDANTDEICKNSLSRKVNSG 1926 

QMIFSD+GT K GF Y ++D + G+P EIAF4 D E K L V +G 
Sbjct: 1300 QMIFSDLGTISVEKTRGFSAYRWIRDELIRLGVPASEIAFMQDFKKSEAKQRLFGDVRAG 1359 

Query: 1927 ETOILmSTEKGGTGLlWQSRMKAVHYLDVPWRPSDIVQRNGRLIRQGNMHQEVDIYHYI 1986 

VR L+ S+E GTG+NVQ R+KA+H+LDVPW PS I QR GR++RQGN H EVDI+ Y 
Sbjct: 1360 RVRFLIGSSETMGTGVWQLRLKALHHLDVPWLPSQIEQREGRIVRQGNQHDEVDIFAYA 1419 

Query: 1987 TKGSFDNYLWQTQENKLKYITQIMTSKDPVRSAEDIDE-QTMTASDFKALATGNPYLKLK 2045 

T+GS D +WQ E K ++I ++ +R EDI E Q + KA+A+G+ L K 
Sbjct: 1420 TEGSLDATMWQNIffiRKARFIAAALSGDTSIRRLEDIGEGQANQFAMAKAIASGDQRLMQK 1479 

' Query: 2046 MELENELTVLENQKRAFNRSKDEYRHTISYSEKHLPIMEKRLSQYDKDIAQSLATKSQDF 2105 
LE ++ LE + A + R + +E+ + + +R+++ +DI + + T +DF 
Sbjct: 1480 AGLEADIARLERLRAAHIDDQHAVRRQLRDAERDIEVSTRRIAEIGQDITRLVPTTGEDF 1539 

Query: 2106 vmFDNQA^OTlAEAGDYLRK-LITYNRSETKEWTLASFRGFDLKM-TTRGASEPLPET 2163 

M + R EAG L K ++T + + +AS GF+L+ R + T 
Sbjct: 1540 TMTVAGKDYSERKEAGRALMKEILTLVQLSPEGEAVIASIGGFELEYHGQRYGKDGYRYT 1599 

Query: 2164 ISLMIVGDNQYTVALDLK-SDVGTIQRIS^1AIDHIIDDQEKIQELVKDLKDKLRVAKVEV 2222 

L G + Y + L+ ++G + R+ +A+D ++E+ ++ + D + +Ii + 
Sbjct: 1600 TMLKRTGAD-YEIELPVTVTPLGAVSRLEHALDDFDGER3RYRQRLGDARRRIASYQSRG 1658 
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Query: 2223 DKVFPKEEDYQLVKAKYDVLAPLVEKEA3IEEIDAALA 22S0 
+ +++ L EK ++ E++ ALA 

Sbjct: 1659 E GSEFAFAGELAEKHRQL&EVETALA 1684 

Identities = 99/271 (36%) , Positives = 153/271 (55%) , Gaps = 10/271 (3%) 

Query: 607 RDKVETNIVAIRLVKMLEVEHRNASPSEQELLAKYVGWGG--LANEFFD DYNPKF 659 

+D+ NI AIRL +E R A+ EQE L ++ G+G UN F ++ + 

Sbjct: 80 KDRARDNIAAIRLAAEIEASERPATREEQSTLIRFTGFGASDLANGVFRRPGELEFRKGW 139 



Query: 660 £ 

+ +L+ V + +Y+ + + + A++T ++R +W L+R G+ GG++L+P +GTG 
Sbjct: 140 DEIGSDLEDAVGETDYASLARCTQYAHFTPEFIVRAIWSGLQRLGWRGGRVLEPGIGTGL 199 

Query: 720 FFAAMPKHLREKSELYGVELDTITGAIAKHLHPNSHIEIKGFETVAFNDNSFDLVISNVP 779 

F A MP+ LR+ S + GVELD +T I+LP+I F SFDL I N P 

Sbjct: 200 FPALMPEALRDLSHVTGVELDPVTACIVRLLQPRARILTGDFARTEL-PASFDIAIGNPP 258 

Query: 780 FANIRIADNRYDRP--YMIHDYFVKKSLDLLHDGGQVAIISSTGTMDKRTENILQDIRET 837 

F++ + +R R +HDYFV +S+DLL G A ++S+GTMDK Q I T 

Sbjct: 259 FSDRTVRSDRAYRSLGLRLHDYFVARSIDLLKPGAFAAFVTSSGTMDKADSAARQHIATT 318 

Query: 838 TEFLGGVRLPDSAFKA.IAGTSVTTDMLFFQK 868 

+ + +RLP+ +F+A AGT V D+LFF+K 
Sbjct: 319 ADLIAMRLPEGSFRADAGTDVWDILFFRK 349 

SEQ ID 5856 (GBS327N) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 148 (lane 8-10; MW 140kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 148 (lane 11-13; MW 115kDa) and in 
Figure 182 (lane 8; MW 115kDa). 

Purified GBS327N-GST is shown in Figure 243, lane 5; Purified GBS327N-His is shown in Figure 235, 
lane 5. 

GBS327C was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 148 (lane 14; MW 73kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1882 

A DNA sequence (GBSxl990) was identified in S.agalactiae <SEQ ID 5857> which encodes the amino 
acid sequence <SEQ ID 5858>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3656 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Wot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1883 

A repeated DNA sequence (GBSxl991) was identified in S.agalactiae <SEQ ID 5859> which encodes the 
amino acid sequence <SEQ ID 5860>. This protein is predicted to be giant membrane protein. Analysis of 
this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3698 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG19662 GB:AE005054 calcium-binding protein homology; Cbp 
[Halobacterium sp. NRC-l] 
Identities = 22/43 (51%) , Positives = 29/43 (67%) , Gaps = 1/43 (2%) 

Query: 9 KDSDQDGLTDAQEDAL - GTDPQS VDTDGDGQADLEELQSGHS P 50 

+D+D DGL+D E+ + GTDP DTDGDG D EL++G P 
Sbjct: 198 RDTDDDGLSDGVEVRVAGTDPTERDTDGDGVDDAAELRAGSLP 240 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1884 

A DNA sequence (GBSxl992) was identified in S.agalactiae <SEQ ID 5861> which encodes the amino 
acid sequence <SEQ ID 5862>. Analysis of this protein sequence reveals the following: 
Possible site: 52 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.39 Transmembrane 1609 -1625 (1609 -1625) 
INTEGRAL Likelihood = -1.81 Transmembrane 30 - 46 ( 29 - 46) 

Final Results 

bacterial membrane Certainty=0 . 1956 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif 1600-1604 

The protein has homology with the following sequences in the GENPEPT database. 
!GB:X57841 antigen I /II [Streptococcus sobrinus] (v. . . 



Sbj. 

Query: 

Sbjct: 

Query: 

Sbj 



23 KSKKYRTLCSVALGTtWTAWAWGGTVAEADEVTTSV DTTIQRTE- -NPATNLPEA 76 

K K RTL LGT + A A G A A+E +T+ DT + TE NPATNLP+ 

23 KVKSGRTLSGALLGTAII^GA--GQKALAEETSTTSTSGGDTAWGTETGNPATNLPDK 80 

77 QPNP VSEQTESMASTGQSNGAIAVTVPHDTVT QAVE 112 

Q NP V T + +S VTV D + + 

81 QDNPSSQAETSQAQARQKTGAMSVDVSTSELDEAAKSPQEAGVWSQDATVNKGTVEPSD 140 

113 EAKAEGVSTVEDSPMDLGNTRSAVET NQQIS K 144 

EA + +D + + A E NQ+I+ K 

141 EANQKEPEIKDDYSKQAADIQKATEDYKASVAANOAETDRINQEIAAKKAQYEQDLAANK 200 
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Query: 145 AD ADTQKQVETINEVTK TYKADKATYESNKARIEQEN 181 

A+ A QK + I + YAK Y+ AR++ N 

Sbjct: 201 AEVERSLMRMRKPREIYEAKXAQNQKDLAAIQQANSDSQAAYAAAKEAYDKEWARVQAAN 260 

Query: 182 KELSQAYEGANQTOKETNAWVDTKWJDLKARYADADVTVKEQ WSSGNGTSVL 234 

+AYE A N + ++++RAAD K +GN + 

Sbjct: 261 AAAKKAYEEAIAAin'AKM)QIKAEIE^IQQRSAKADYEAKIiAQYEKDLAAAQAGNAANEA 320 

Query: 235 DY TNYGKAVETIQSTNEQAUADY LTKKTKADDIVAKNQAIQKENEA 280 

DY Y + + +Q+ N A Y K I A+N+AIQ+ +A 

Sbjct: 321 DYQAKKAAYEQELARVQAANAAAKQAYEQALAANSAKNAQITAENEAIQQNAQAKADYEA 380 

Query: 281 GLANAKADNEAIERRNQAGQAAVDAEN- - -RAGQAAVDQANQEKQQLVSDRAA 330 

LA A++ N A E Q AA + E +A AA OA +++ Q + + A 
Sbjct: 3 81 KIAQYQKDIJ^QSGNAANEADYQEKI^YEKEl^VQAANAAAKQAYEQQVQQANAKNA 440 

Query: 331 EIEAITKRNKEKEAAARKENEAIDAYNTKEMERYQRDLAEIS 372 

EI + +E+ A A+ + E + +E+ +Y++DLAE 
Sbjct: 441 EITEANRAIRERNAKAKTDYELKLSKYQEEJAQYKKDIJffiYPAKLQAYQDEQAAIKAALA 500 

Query: 373 KGEEGYISEALAQALNLNNGEPQAQHGAITRN 404 

K E+G +SE AQ+L + + EP AQ +T 
Sbjct: 501 ELEKHKMDGI^SEPSAQSL-WDLEPNAQVALOTDGKLLKASALDEAFSHDEKNYNNHL 559 

Query: 405 --PDQI ISTGDALLGGYSRILDSTGF FVYDMFKTGETLS 441 

PD + +++ L G + D G+ F + K G++ + 

Sbjct: 560 LQPDNLNVTYLEQADDVASSVELFGNFG DKAGWTTTVSNGAEVKFASVLLKRGQSAT 616 

Query: 442 FNYQNLQHARFDGKKISRVTYDITNLVSPAG TNAVKLWPNDPTEGFIAYRNDGN 496 

Y NL+++ ++GKKIS+V YTVP TVL + DPT G A G 

Sbjct: 617 ATYTNLKNS YYNGKKI SKWYKYT- -VDPDSKFQNPTGNVWLGI FTDPTLGVFASAYTGQ 674 

Query: 497 GDWRTD KNEFRVVAKYYLEDGSQVTFSKEKPGWTHSSLNHiroiGIjEYVKDSSGKFV 553 

+ T K EF 
Sbjct: 675 NEKDTSIFIKNEF-- 

Query: 554 PINGSTVQVTN EGLARSLGSNRASDLNLPEEWDTTS SRYAYKGAI V 599 

I+GS++ N EG + RAS+ WD+ + ++ GA 

Sbjct: 728 KISGSSIGEKNGMIYATDTLNFKKGEGGSLHTMYTRASEPG- - SGWDSADAPNSWYGAGA 785 

Query: 600 STVTSGNTY TVTFGQGDMPQNVGL SYWFALN -- 630 

++ N Y T +MPQ G + W++LN 

Sbjct: 786 VRMSGPNOTITLGATSATNVLSLAEMPQVPGKBOT'AGKKPNIWYSLNGKIRAVNVPIOT 845 

Query: 631 - -TLPVARTVTPYSPKPHVTVEL EPIPEPITVTPDIYTPKTFTPEKPVTFT 679 

P P P V EL EP EP TP P PEKPV T 

Sbjct: 846 EKPTPPVEPTKPDEPTYEVEKELVDLPVEPKYEP-EPTPPSKNPDQSIPEKPVEPTYEVE 904 

Query: 680 PKPLDEWQPSLTLTKVT LPVKPIPKELPTPP QVPTV 716 

P P++ + T 4- T PV+P + LPTPP VPTV 

Sbjct: 905 KELEPAPVEPSYEKEPTPPQSTPDQEEPEKPVEPSYQSLPTPPVEPVYETVPGPVSVPTV 964 1 

Query: 717 HYHAYRLTTTSEIMKEVWSDQANLHEKTVAKDSTVIYPLTVDALSPNRAQTTSLIFEDY 776 

YH Y+L + KE+ N D ++ + VAK STV + L L R +TTS + D 

Sbjct: 965 RYHYYKLAVQPGVTKEIKNQDDLDIDKTLVAKQSTVKFQLKTADLPAGRPETTSFVLMDP 1024 

Query: 777 LPAGYLFDKETTQKENGNYVLSFDETKNFVTLTAKENLLQEVNKDLTQVYQLTAPKLYGS 836 

LP+GY + E T+ + + S+D + VT TA L +N+DLT+ P + G 

Sbjct: 1025 LPSGYQIjNLFATKVASPGFEaSYDAMTHTVTFTATAETIiAAIjNQDLTKAVATIYPTWGQ 1084 , 

Query: 837 VQNDGATYSNSYKLLLNKGTTNAYTVTSNWTVRTPG DGETTTLITPDKNNENAD 891 

V NDGATY+N++ L++N +AY + SN+V V TPG D + ITP K N+N + 

Sbjct: 1085 VLNDGATYTNNFTLMVN DAYGIKSNIVRVTTPGKPNDPDNPSNNYITPHKVNKNEN 1140 

Query: 892 GVLIMTWALGTTNHYRLTWDLDQYKGDRSAKETIARGFFFVDDYPEEVLDWENGTAI 951 

GV+I+ V GTTN+Y LTWDLDQYKGD+SAKE I +GFF+VDDYPEE LD+ + + 
Sbjct: 1141 GWIDGKSVLAGTTNYYELTWDLDQYKGDKSAKEIIQKGFFYVDDYPEEALDLRTDLIKL 1200 
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Query: 952 TTLDGQKVSGIWKNYASrJ^PKBLQDKIARAKITPTGAFQVFMPDDNQAFYDQYVQTG 1011 

T +G+ V+G++V +YASL AP +QD L +A I P GAFQVF DD QAFYD YV TG 
Sbjct: 1201 TDANGKAVTGVSVADYASLEAAPAAVQDMLKKANIIPKGAFQVFTADDPQAFYDAYVVTG 1260 

Query: 1012 TSIALLTKMWKDSLYGQTKTYTNKAYQVDFGNGYETKEVTOTLVSPEPKKQ-NIMKDKV 1070 

T L ++T MTVK + +Y N+AYQ+DFGNGYE+ V N + P+K L D 

Sbjct: 1261 TDLTIVTPMWKAEMGKTGGSYEI^YQIDFGNGYESNLVVNNVPKINPEKDVTLTMDPA 1320 

Query: 1071 D, INGKPMLVGTQNHYTLSWDLDQYRGIKADNSQIAQ3FYFVDDYPE EALLPD 1122 

D ++G+ + + +Y L + I AD+++ + F DDY + 

Sbjct: 1321 DSTNVDGQTIALNQVFNYRLIGGI IPADHAEELFEYSFSDDYDQTGDQYTGQYKA 1375 

Query: 1123 EAAIQFVTSDGKTV-SGITVKSY- -SQLLEAPKTLQAAFSKQKIQPKGAFQVFMPE 1175 

A + DG + +G + SY 4-Q+ EA + F + ++ F E 

Sbjct: 13 76 FAKVDLTLKDGTIIKAGTDLTSYTEAQVDEAUGQ:WTFKEDFLRSVSVDSAFQAE 1431 
Identities = 209/442 (47%), Positives = 280/442 (63%), Gaps = 27/442 (6%) 

Query: 1198 TVLETMI^SGKSY-ENVAYQVDFGQAYETOTVTNFVPK VTPHKSNTNQ 1244 

TV+ +LN G +Y N V+ ++N V P +TPHK N N+ 

Sbjct: 1080 TWGQVLNDGATYTI^FTLMVNDAYGIKSNIVRVTTPGKPN^ 1139 

Query: 1245 EGISIDGKTVLPNIVNYYKIVLDYSQYKDMVVTDDvIAKGFYMVDDYPEEALTLNPDGIQ 1304 

G+ IDGK+VL T NYY++ D QYK +++ KGF+ VDDYPEEAL L D 1 + 

Sbjct: 1140 NGWIDGKSVIAGTTNYYELTWDLDQYKGDKSAKEIIQKGFFYVDDYPEEALDERTDLIK 1199 

Query: 1305 VlDI<IDGimVSGISVSTYASLSEAPKWQDAMAKRQFTPKGAIQVLSSDDPKVFYDTYVKT 1364 

+ D +G V+G+SV+ YASL AP VQD + K PKGA QV ++DDP+ FYD YV T 
Sbjct: 1200 LTDANGKAVTGVSVADYASLEAAPAAVQDMLKKANI I PKGAFQVFTADDPQAFYDAYWT 1259 

Query: 1365 GQTLVVTLiPMTOKNELTKTGGQYEOTAYQIDFGIiAYVTETVvWWPKLDPQKDVVIDLSH 1424 

G h + PMTVK E+ KTGG YEN AYQIDFG Y + WNNVPK++P+KDV + + 
Sbjct: 1260 GTDLTIVTPMTVKAEMGKTGGSYENRAYQIDFGNGYESNLVVNNVPKINPEKDVTLTMDP 1319 

Query: 1425 KDA-SLDGKEVALHQTFNYRLVGAMIPSNRATDLFEYGFEDNYDEKHDEYNGVYRSYLMT 1483 

D+ ++DG+ +AL+Q FNYRL+G +IP++ A +LFEY F D+YD+ D+Y G Y+++ 
Sbjct: 1320 ADST^GQTIALNQVFNYRLIGGIIPADHAEELFEYSFSDDYDQTGDQYTGQYKAFAPCV 1379 

Query: 1484 DVILKDGSVLKEGTEVTKYTLQQVDTENGLVSISFDKSFLETVSDDSAFQADVYLQMKRI 1543 

D+ LKDG+++K GT++T YT QVD NG + ++F + FL +VS DSAFQA+VYLQMKRI 
Sbjct: 1380 DLTLKDGTIIKAGTDLTSYTEAQVDEANGQIWTFKEDFLRSVSVDSAFQAEVYLQMKRI 1439 

Query: 1544 AAGQVENTYLHTVNGYVISSNTWTHTPQPEEPSPNQP TPPQPPIETIEPPV 1595 

A G NTY++TVNG SSNTV T TP+P++PSP P P Q PP 

Sbjct: 1440 AVGTFANTYVOTVNGITYSSOTVRTSTPEPKQPSPVDPKTTTTWFQPRQGKAYQPAPPA 1499 

Query: 1596 PASILPNTGEQES LLGLI 1613 

A LP TG+ + LLGL+ 
Sbjct: 1500 GAQ-LPATGDSSNAYLPLLGLV 1520 
Identities = 100/210 (47%), Positives = 137/210 (64%), Gaps = 4/210 (1%) 

Query: 1060 PKKQNLNIODKOTINGKPMLVGTQInIHYTLSWDLDQYRGIKADNSQIAQGFYFVDDYPEEAL 1119 

P K N N++ V I+GK +L GT H+Y L+WDLDQY+G K+ I +GF++VDDYPEEAL 
Sbjct:' 1132 PHKVNKNENGWIDGKSVIAGTTNYYELTWDLDQYKGDKSAKEIIQKGFFYVDDYPEEAL 1191 

Query: 1120 LPDEAAIQFVTSDGKTVSGITVKSYSQLLEAPKTLQAAFSKQKIQPKGAFQVFMPEDPQA 1179 

1+ ++GK V+G++V Y+ L AP +Q K I PKGAFQVF +DPQA 
Sbjct: 1192 DLRTDLIKLTDANGKAVTGVSVADYASLEAAPAAVQDMLKKANIIPKGAFQVFTADDPQA 1251 

Query: 1180 FFESYVTKGENITIVTPMTVLETICiNSGKSYEOTAYQVDFGQAra 1239 

F+++YV G ++TIVTPMTV M +G SYEN AYQ+DFG YE+N V N VPK+ P K 
Sbjct: 1252 FYDAYVVTGTDLTIVTPMTTOAEMGKTGGSYENRAYQIDFGNGYESNLVVNNVPKINPEK 1311 

Query: 1240 SNT NQEGISIDGKTVLPNTVNYYKIV 1265 

T + ++DG+T+ K V Y+++ 

Sbjct: 1312 DVTLTMDPADSTNVDGQTIALNQA^FNYRLI 1341 

There is also homology to SEQ ID 598. 
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SEQ ID 5862 (GBS76) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 2; MW 17.4kDa). The GBS76-His fusion product was purified (Figure 
196, lane 8) and used to immunise mice. The resulting antiserum was used for FACS (Figure 294), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1885 

A DNA sequence (GBSxl993) was identified in S.agalactiae <SEQ ID 5863> which encodes the amino 
acid sequence <SEQ ID 5864>. This protein is predicted to be abortive infection bacteriophage resistance 
10 protein (abiEi). Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 .2765 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 993 1> which encodes amino acid sequence <SEQ ID 9932> 
20 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB52382 GB:U36837 AbiEi [LactOCOCCUS lactis] 
Identities = 51/206 (24%) , Positives = 90/206 (42%) , Gaps = 23/206 (11%) 

25 Query: 17 KNNGIVTNKDCKALGIPTIYLTRLEKEGIIFRVEKGIFLTQNGDYDEYYFFQYRFPKAIF 76 

KG+K + Gl YL+ + + V+KG+++ + D + FQ ++ KA+ 
Sbjct: 76 KYKGNI IRKI VRDEGISDYYLRKFVLKYNLTEVDKG VYI FPHKKKDSLFI FQQKYSKAVI 135 

Query: 77 SYISALYLQQFTDEIPQYFDVTVPRGYRF NTPPANBNI 114 

30 S+ +4-LYLQ D IPQ ++VP Y N N+ I 

Sbjct: 136 SHETSLYLQDVIDYIPQKIQMSVPEKYNISRIQEPHENRLTSYNYVDINSNNIMDKNIPI 195 

Query: 115 HFV-SKEYSELGMTTVPTPMGNNVRVYDFERIICDFVIHREKIDSELFVKTLQSYGNYPK 173 
+ V +K S + TV + +G +RV RID+ K + E+ +++Y 
35 Sbjct: 196 I^VRNKSiaPTQIErraSFLGLPLRVTSIARSIVDVLKPSHKAEEEVKEQAIKYYLERFP 255 

Query. 174 KNLAKLYEYATKMNTLEKVKQTLEVL 199 

N+ +L A N L++++ L +L 
Sbjct: 256 DNIVRLKRIAKTQNVLKELEYYLILL 281 

40 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1886 

45 A DNA sequence (GBSxl994) was identified in S.agalactiae <SEQ ID 5865> which encodes the amino 
acid sequence <SEQ ID 5866>. This protein is predicted to be abortive infection bacteriophage resistance 
protein (abiEii). Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood = -1.12 Transmembrane 260 - 276 ( 259 - 277) 
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Final Results 

bacterial membrane Certainty=0 .1447 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < auco 

5 bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB52383 GB:U36837 AbiEii [Lactococcus lactis] 
Identities = 76/276 (27%) , Positives = 135/276 (48%) , Gaps = 19/275 (6%) 

10 



Query: 


14 


SKNTGLTFNSVMTYYFLEVILKKLSQSSYSNHYIFKGGFLLSNVIGVESRSTVDIDFLFH 


73 






++ N + + Y E L +LS S Y ++ KGGFL+ + R+T D+D 




Sbjct: 


12 


TRNDDIGIENYRIRYATERFLTRLSASQYKEKFVLKGGFLIGVTYNLSQRTTKDLDTALI 


71 


Query: 


74 


QITLSEETVKQQLKEIL-ADSEEGISF^/IQSITTIKESDDYGGYRATISCQLE--NIKQV 130 






+++++ + EI D E+ + F ++ +T+ ++ Y GYRA + N + 




Sbjct: 


72 


DFKSDAQSIERVITEICNIDLEDQVLFKLKELTSSQDMRIYPGYRAKLKMMFPDGNTRID 






131 


IHLDIATGDWTPQPITYDYKAIFDE DNFPI IAYTIETIIAEKLQTI YSRNFLNS 








LDI GD +TP+ IF+E ++AY ETI AEKL+TI +R +N+ 




Sbjct: 


132 


FDLDIGVGDRITPFJUCKIKIELIFNEVKGVEKQIEVIAYPKSTIQAEKLETIIjTRGKVNT 191 


Query: 


186 


RSKDFYDVYII.--SKLKKKDIDFNQLKNACQRTFSYRE-TELDFEKIIE LLERFK 


237 






R KD+YD ++L + IF A + T+ +R T+ E++ E L E + 




Sbjct: 


192 


RMKDYYDFHLLLTDQENSNS I S FYY AFKNTWEFRNPTQFIDEELFEDWLFILDEILE 


248 


Query: 


238 SDPTQNQQWQNYSKKYSYTKGISLANVTjDEMISIiIT 273 








S + + W NY K +Y K +++ +++ E+ ++ 




Sbjct: 


249 


SKELKEKYWPNYIKDRNYAKHUJMDDIISEIKEFVS 284 





30 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1887 

35 A DNA sequence (GBSxl995) was identified in S.agalactiae <SEQ ID 5867> which encodes the amino 
acid sequence <SEQ ID 5868>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0. 1137 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1888 

50 A DNA sequence (GBSxl996) was identified in S.agalactiae <SEQ ID 5869> which encodes the amino 
acid sequence <SEQ ID 5870>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm' — - Certainty=0. 2782 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

5 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
10 vaccines or diagnostics. 

Example 1889 

A DNA sequence (GBSxl997) was identified in S.agalactiae <SEQ ID 5871> which encodes the amino 
acid sequence <SEQ ID 5872>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
15 >» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-10.14 Transmembrane 310 - 326 ( 301 - 334) 

Final Results 

bacterial membrane --- Certainty=0. 5055 (Affirmative) < suco 
20 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG38044 GB:AF295925 Orf28 [Streptococcus pneumoniae] 
25 Identities = 272/344 (79%) , Positives = 307/344 (89%) 

Query: 568 VYVNPAFYFPKVIQVOTTILPTIGQFGGDEFERAKMYDYLKSKGATNQMAAILGNWSV 627 

+YVNP FYFPKVIQ+QTTILP IGQFGGDEFERAK IY++LKS+GA+ QAIAAILGNWSV 
Sbjct: 1 MYVNPQFYFPKVIQLQTTILPAIGQFGGDEFERAKHIYEFLKSCGASPQA1AAILGNWSV 60 

30 

Query: 628 ESSINPKRAEGDYLSEPVGATDSSWDDEGWL.TLNGPTIYNGRYPNILKRGLGLGQWTDTA 687 

ESSINPKRAEGDYL+PPVG WDDE VJL + GP IY+G YPNIL RGLGLGQWTDTA 

Sbjct: 61 ESSINPKRAEGDYLTPPVGVPIPPWDDESWLAIGGPAIYSGAYPNILHRGLGLGQWTDTA 120 

35 Query: 688 DGSRPJlTLLLEYAKGKHQKWYDLGIiQLDFMIl.YGDSPYYTNWLKDFFKNSGSPASLAQLFL 747 

DGS RHT LL YA+ +++KWYDL LQLDFML+GDS PYY +WLKDFFKN+GS A+LAQLFL 
Sbjct: 121 DGSTRHTALIiNYARTQNKKWYDLDLQLDFMLHGDSPYYQSWLKDFFKNTGSAANLAQLFL 180 

Query: 748 IYWEGNSGDKLLERQTRASEWYYQ1EKGFSQENGGTAQSDPKALEAVREDLFENSIPGGG 807 
40 YWEGNSGDKLLERQTRA+EWYYQIEKGFSQ NGG A+SDP++LE VR DL+++S+PGGG 

Sbjct: 181 TYWEGNSGDKLLERQTRATEWYYQIEKGFSQTNGGQAKSDPQSDEGVRGDLYDHSVPGGG 240 

Query: 808 DGMGYAYGQCTWGVAARINQLGLKLKGKNGEKIP1ISTMGNGQDWVP.TAASLGGETGTSP 867 
DGM YAYGQCTWGVAAR+NQLGLKLKG+NGEKI 1 1 +TMGNGQDWV T++SLGGETG++P 
45 Sbjct: 241 DGMAYAYGQCTWGVAARMNQLGLKLKGRNGEKI S 1 INTMGNGQDWVATSSSLGGETGSTP 300 

' Query: 868 QEGAILSFAGGGHGTPTEYGHVAFVEKVYPDGSFLISETNYNGN 911 
+ GAI+SF GG HGTP YGHVAFVEKVY DGSFL+SETNY GN 
Sbjct: 301 RAGAIVSFVGGTHGTPASYGHVAFVEKVYDDGSFLVSETNYGGN 344 

50 

SEQ ID 5872 (GBS74d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 121 (lane 3 & 4; MW 95.5kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 121 (lane 5-7; MW 70.5kDa) and in 
Figure 179 (lane 9; MW 70.5kDa). 
55 GBS74d-His was purified as shown in Figure 233, lane 7-8. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1890 

A DNA sequence (GBSxl998) was identified in S.agalactiae <SEQ ID 5873> which encodes the amino 
acid sequence <SEQ ID 5874>. This protein is predicted to be TrsE-like protein. Analysis of this protein 



3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5526 (Affirmative) < succ; 

bacterial membrane Certainty=D . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

= 1/782 (0%) 



Query: 


1 


Sbjct: 


3 


Query: 


60 


Sbjct: 


63 


Query: 


120 


Sbjct: 


123 


Query: 


180 


Sbjct: 


183 


Query: 


240 


Sbj ct : 






300 


Sbj ct : 


303 






Sbj ct : 


363 


Query: 




Sbj ct : 


423 


Query: 


480 


Sbjct: 


483 


Query: 


540 


Sbj ct : 


543 


Query: 


600 


Sbj ct : 


603 



LGDVNYQTVGL+DKGAI+EKYSDLI SLDD+TNFQLTIFN+++NLEKFR S+LY +EDG 



+D+YR ELNRMM+ NL++GENNFSAVK +SFG+ D PK A+RSLSQIGEYFKSGFSEID 



L GEERVN+IiADMLRGE+HLPFSY+DLT SGQ+T+HFIAP L FK+KN+++++D 



TEIIIVDPE EYS+IG+ FGGE IDIAPDS T+LNVL+LS+ENMDEDPVKVKSEFLLS+I 



GKLLDRKMDGREKS+ I DR VTRLTY+ F PSL EWVFVLSQQPE+EA++LALDMELYVEG 



SLDIFSH+TNI+T S+FLIYNVKKLGD3LKQIALMV+FDQIMSKW+NQKLGKKTWIYFD 
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Query: 660 EMQLLLLDKYASDFFFKLWSRVRKYGAI FIGITQNVETLLLDANGRRI IANSEFMILLKQ 719 

E++LLLLDKY SDFFFKLWSRVRKYGA PTGITQNVETLLLD NGRRI IANSEFMILLKQ 
Sbjct: 663 EIELLLLDKYPSDFFFKLWSRTOICYGASPTGITQNVETLLLDPNGRRI IANSEFMILLKQ 722 

5 Query: 720 AKSDREELVHMLGLSKELEKYLVNPEKGAGLIKAGSTWPFKNKIPQHTKLFDIMSTDPE 779 

AK+DREELV +LGLSKELEKYLVNPEKGAGLIKAGS WPFKNKIPQ 4+LFDIM +DP+ 
Sbjct: 723 AKNDREELVQLLGLSKELEKYLVNPEKGAGLIKAGSVi/i/PFKNKIPQGSQLFDIMRSDPD 782 

Query: 780 KM 781 
10 KM 

Sbjct: 783 KM 784 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

A related GBS gene <SEQ ID 8925> and protein <SEQ ID 8926> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -26.26 
20 GvH: Signal Score (-7.5): -3.87 

Possible site: 55 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 0 value: 6.26 threshold: 0.0 
PERIPHERAL Likelihood = 6.26 335 
25 modified ALOM score: -1.75 

*** Reasoning Step: 3 

Final Results 

30 bacterial cytoplasm Certainty=0. 5526 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) 

The protein has homology with the following sequences in the databases: 

35 33.5/57.2% over 789aa 

Enterococcus faecalis 
GP | 8100663 | TrsE-like protein Insert characterized 

ORF01332(319 - 2628 of 2949) 
40 GP)8100663|gb|AAF72347.l|AF192329_8|AF192329(2 - 791 of 799) TrsE-like protein 

{Enterococcus faecalis 
' } 

%Match =20.7 

%Identity =33.4 %Similarity =57.2 
45 Matches = 259 Mismatches = 323 Conservative Sub.s = 184 

210 240 270 300 330 360 387 

SCYLGSIAPTIYHLKOTSSTVFIMN*RCQTAHLLEEKETNVKI<LKHSMKSKTSSNDKKQKTKTQKQEI --S 

II :| : |: 1 = = M 

50 MSKKEIPRETEKTKLTRAQRKEIDAVIRKYKGDGR 

10 20 30 

414 444 474 504 534 564 594 624 

PSTW-TIAYQGLFQNGLMQVSPSYFSQTYLLGDVNYQTVGLDDKGAIVEiCi'SDLINSLDDKTNFQLTIFNQKVNLEKFR 
55 1 I 1= = = :h =111 11= = l = = ll I = II II II I =1 = l = = =1 = 11= = = 

PHTAQQSIPYEVMYPDGVCRVSPGVFSKCIEFADISYQLAQPDTQTAIFEIOjCDLYNYVDASIHIQFSFIMKVDPVQYA 
50 60 70 80 90 100 no 

654 684 714 744 774 804 834 864 

60 KSILYPLQEDGFDTYRDELNRMMDANLFAGENNFSAVKFLSFGKSDQTPKIAFRSLSQIGEYFKSGFSEIDVSLGLLGGE 
II I I II I I :: I II 1=1=1 == I I I =11 = I = :: I 

KSFEIAPQGDDFDDIRAEYTGILQKQLANGNNGMVICTKYLTFTIEAESVKAA 
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130 140 150 160 170 180 190 

894 921 951 981 1011 1041 1071 1101 

ERVNVLflDl^RGENHL-PFSYinjLTLSGQOT 
5 ||=|:| = = = I =1 I II III 1111= 11= ::=::: || | : |:: :: 

ERI^LLHGVYHPDGEII^FDWiaJLAPSGLSTKDFIAPSSLCFGNAKTFGMGGKYGAVSFLQILSPELSDDMLADFIOTES 
210 220 230 240 250 260 270 

1131 1161 1191 1221 1251 1281 1311 1341 

1 0 EVMISLHAKGSTKSETMTKLRTKKTLMESQKIGEQQKMARTGIYL 

|:::|| : :::::: | | ::: || ||:| |:| =: : | >:|: II : ::|| ||= 

GVLVNLHVQAIEQTKAIKTIKRKITDLDAMKIAEQKKAWSGYCMDILPSDIATYGEDAKKIjLTKLQTRNERLFQLTFLV 
290 300 310 320 330 340 350 

15 1371 1401 1431 1461 1491 1521 1551 

gvladtedqlkqsldiikqvagsndmiidnl™ 

:|||: =| = II :: = I I II = I 11 = 1 I = =111 N = = ll 1= = = = I 

IimDTKQIa J ^INDVFQAAGVAQKlmCPLWLDYQQEQGIASSLPLGVNQI-KIQRSLTTSSVAVWPFVTQEI 1 FQGGA^ 
370 380 390 400 410 420 430 

20 

1608 1638 1663 1698 1728 1758 1788 1818 

FYGINQISSNI ISIDRGKLHTPSGLILGTSGAGKGMATKHEI ISTKLKEADSDTEI I IVDPENEYSIIGQAFGGESIDIA 
= 1111 I 1 = 1 =11 = I: I III Ml 1 = I 11 = 1 I I =1 I III II = = = 1= I = = 

YYGINAKSRNMIMLDRKQARCPNALKLGTPGSGKSMSCKSEIVSVFLTTPD D I FI SDPEAEYYPLVKRLHGQVI RLS 

25 450 460 470 480 490 '500 510 

1848 1875 1905 1935 1959 1989 2019 

PDSTTFLNVLELS-DENMDEDPVKVKSEFLLSWIGKLLDRK--MDGREKSLIDRVTRLTYKHFDTPSLVEWVFVLS 

II | = | | = := = = l = = |: =11 = 1 = 11= == I == lh = lll 1= 1= = I = =11 

30 PTSKDFVNPLDINLiraSEDDNPIALKSDFVLSFCEL^^ 

530 540 550 560 570 580 590 

2058 2088 2118 2148 2178 2208 2238 2268 

QQPEQEAKDLALDMELYVEGSLDIFSHRTmKTDSHFLIYWKXLGDEL^ 

35 | || :| ::||| |||==|=||||= : == ===1=11 =ll== ==== llll II 1= II II 

ALLDQHVPFADRVAQALDLYVSGSIJWFiraRTimJIGNRLVSFDIKELGKQLKKLGMLIVQDQIWGRVTAW 

610 620 630 640 650 660 670 

2298 2328 2358 2388 2418 2448 2478 2508 

40 YFDEMQLliLDDKYASDFFFKLWSRWKYGAIPTGITQNVETLLLDANGRRIIANSEFMILLKQAKSDREELVHMLGLSKE 

: II =1|| == = = ==l I 11=1 llll 1111= II 1= 11=1= II II 11= 1 I II I 

FADEFHLLLKEEQTAAYSAEIWKRFRKWGGIPTGATQNVKDLLSSPEIENILENSDFITLLNQASGDRKILAERLNLSTE 
690 700 710 720 730 740 750 

45 2538 2568 2598 2628 2658 2688 2718 2748 

LEKYLWPEKGAGLIKAGSTWPFKNKIPQHTKLFDIMSTDPEKI']RT*DERG*KASQTG*AKLSKQLKISSYALSERS*D 

= 11=11111= = 1 = 11 I II = = 1 = 1= 11 = 1 = 
QQKYIDNSEPGEGLLIFEWWLPFTNPIPHNTQLYKIMTTRLKEVAGV 
770 780 790 

50 

A related GBS gene <SEQ ID 8927> and protein <SEQ ID 8928> were also identified. Analysis of this 
protein sequence reveals the following: 

This protein might be involved in vancomycin resistance 

The protein has homology with the following sequences in the databases: 

55 >GP | 8100663 |gb|AAF72347.l|AF192329_8|AF192329 TrsE-like protein 

{Enterococcus faecalis} 

Score = 427 bits (1086), Expect = e-118 

Identities = 257/785 (32%) , Positives = 431/785 (54%) , Gaps = 28/785 (3%) 

60 

Query: 9 DKKQKTKTQKQE I S PSTVN-TDAYQGLFQNGLMQVSPSYFSQTYLLGDV 56 

+K + T+ Q++EI P T ++ Y+ ++ +G+ +VSP FS+ D+ 

Sbjct: 11 EKTKLTRAQRraiDAVIRICYKGDGRPHTAQQSIPYEvMYPDGVCRVSPGVFSKCIEFADI 70 



65 



Query: 57 NYQTVGLDDKGAIVEKYSDLINSLDDKTNFQLTIFNQICNLEKFRKSILYPLQEDGFDTY 116 
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Query: 117 RDELNRMMDANLEAGENNFSAVKFLSFGKSDQTPKLAFRSLSQ1GEYFKSGFSEIDVSLG 176 
5 RE ++ L G N K+L+F ++ K A L +IG F + 

Sbjct: 131 RAEYTGILQKQLANGHNGMVKTKYLTFTIEAESVKAARftRLKRIGFDLLGyFKSMGAVAH 190 

Query: 177 LLGGEERVinnjyDMLRGEKHL-PFSYKDLTLSGQSTKHFIAPTYLSFKHKNHIELDDRLL 235 
++ G ER+N+L + + + F +K L SG STK FIAP+ L F + + + 

10 Sbjct: 191 VMDGWERLNLLHGVYHPDGEIFNFDWKVILAPSGLSTKDFIAPSSLCFGNAKTFGMGGKYG 250 

Query: 236 QIVYVRDYGMELGDKFIRDLMQSDLEWnSLHAKGSTKSETMTKLRTKKTLMESQKIGEQ 295 

+ +++ EL D + D + ++ V+++LH + +++ + 4-+ K T +++ KI EQ 
Sbjct: 251 AVSFLQILSPELSDDMLADFLNTE.SGVLWLHi/QAIEQTKAIKTIKRKITDLDAMKIAEQ 310 

15 

Query: 296 QKMARTGIYLEKVGHVLENNIDEAEALLQTMTQTGDKLFDTVFLIGVLADTEDQLKQSIjD 355 

+K R+G ++ + L ++A+ LL + ++LF FL+ +ADT+ +L + 
Sbjct: 311 KKAWSGYDMDILPSDIATYGEDAKQliLTKLQTR^RLFQLTFLVIjWADTKQKIiNNDVF 370 

20 Query: 356 IIKQVAGSNDMIIDNLTYMQEAAFNSLLPFGKKYLEGVSRSLLTSNIAVNAPWTSVDIHD 415 

VA ++ + L Y QE S LP G N ++ + RSL TS++AV P+ + ++ 
Sbjct: 371 QAAGVAQKHNCPLWLDYQQEQGLASSLPLGVNQI K- IQRSLTTSSVAVFVPFVTQELFQ 429 

Query: 416 KGGK- FYGINQISSNI ISIDRGKLNTPSGLILGTSGAGKGMATKHEI ISTKLKEADSDTE 474 
25 G +YGIN S N+I +DR + P+ L LGT G+GK M+ K EI+S L D + 

Sbjct: 430 GGAAMYYGINAKSRNMIMLDRKQARCPNALKLGTPGSGKSMSCKSEIVSVFLTTPD D 486 

Query: 475 I1IVDPENEYSIIGQAFGGESIDIAPDSTTFLNVLELS-DENMDEDPVKVKSEFLLSWIG 533 
I I DPE EY + + G+ I + + P S F+N L+++ + + D++P+ +KS+F+LS+ 
30 Sbjct: 487 IFISDPEAEYYPLVKRLHGQVIRLSPTSKDFVNPLDINLNYSEDDNPLALKSDFVLSFCE 546 

Query: ,534 KLLDRK- -MDGREKSLIDRVTRLTYKHF DTPSLVEWVFVLSQQPEQEAKDLAL 584 

++ K ++ EK++IDR R+ Y+ + + P L + L Q EA +A 

Sbjct: 547 LVMGGKNGLEAIEKTVIDRAVRVIYRPYLADPRPENMPILSDLHKALLDQHVPEADRVAQ 606 

35 

Query: 585 DMELYVEGSLDIFSHRTNIKTDSHFLIYNVKKLGDELKQIALMVIFDQIVgNRWKNQKLG 644 

++LYV GSL++F+HRTN+ + + +++K+LG +LK++ ++++ DQIW RV N+ G 
Sbjct: 607 ALDLYVSGS LNVFNHRTNVD IGNRLVSFD I KELGKQLKKLGML I VQDQIWGRVTANRSQG 666 

40 Query: 645 KKTWIYFDEMQLLLLDKYASDFFFKLWSRVRI<YGAIPTGITQHVETLLLDANGRRIIANS 704 

K TW + DE LLL ++ + + ++W R RK+G IPTG TQNV+ LL 1+ NS 

Sbjct: 667 KATWYFADEFHLLLKEEQTAAYSAEIWKRFRKWGGIPTGATQNVKDLLSSPEIENILENS 726 

Query: 705 EFMILLKQAKSDREELVHMLGLSKELEKYLVNPEKGAGLIKAGSTWPFKNKIPQHTKLF 764 
45 +F+ LL QA DR+ L h LS E +KY+ N E G GL+ + V+PF N IP +T+L+ 

Sbjct: 727 DFITLLNQASGDRKILAERLNLSTEQQKYIDNSEPGEGLLIFENWLPFTNPIPHNTQLY 786 

Query: 765 DIMST 769 
IM+T 

50 Sbjct: 787 KIMTT 791 

SEQ ID 8926 (GBS75) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 17 (lane 11; MW S9.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 20 (lane 6; MW 114.7kDa). 

55 GBS75-GST was purified as shown in Figure 1 97, lane 8. 

GBS329 was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown 
in Figure 77 (lane 8; MW 89kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE 
analysis of total cell extract is shown in Figure 174 (lane 2; MW 1 14kDa). 



GBS329-GST was purified as shown in Figure 220, lanes 9 & 10. 
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Example 1891 

A DNA sequence (GBSxl999) was identified in S.agalactiae <SEQ ID 5875> which encodes the amino 
acid sequence <SEQ ID 5876>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2442 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 1892 

A DNA sequence (GBSx2000) was identified in S.agalactiae <SEQ ID 5877> which encodes the amino 

acid sequence <SEQ ID 5878>. This protein is predicted to be DNA-directed RNA polymerase ii largest 

subunit. Analysis of this protein sequence reveals the following: 

20 Possible site: 21 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 4393 (Affirmative) < suco 

25 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1893 

A DNA sequence (GBSx2001) was identified in S.agalactiae <SEQ ID 5879> which encodes the amino 
acid sequence <SEQ ID 5880>. Analysis of this protein sequence reveals the following: 
35 Possible site: 13 



> Seems to 


have no N-terminal signal sequence 










INTEGRAL 


Likelihood = 


-9 


92 


Transmembrane 


256 


272 


250 


277 


INTEGRAL 


Likelihood = 




28 


Transmembrane 


216 


232 


213 


244 


INTEGRAL 


Likelihood = 






Transmembrane 




167 


148 


191 


INTEGRAL 


Likelihood = 


-7 


27 


Transmembrane 


57 


73 


54 


80 


INTEGRAL 


Likelihood = 


-6 


74 


Transmembrane 


93 


109 


88 


111 


INTEGRAL 


Likelihood = 


-3 


SO 


Transmembrane 


172 


188 


168 


191 


INTEGRAL 


Likelihood = 


-2 


76 


Transmembrane 


113 


129 


110 


130 



45 Final Results 

bacterial membrane --- Certainty=0. 4970 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAG38039 GB:AF295925 Orf23 [Streptococcus pneumoniae] 
Identities = 71/86' (82%) ,- Positives = 83/86 (95%) 

Query: 37 VKSLADENPTWSYMTAITKGIMQPLGVAIIAWLVIiEFSKMAKKIANSGGAMTPEAIAP 96 
5 +KSL+ +NPTVW+YM++ITK +MQPLGVAIL+VVL+LEFSKI4AKKIANSGGAMTFEA-tAP 

Sbjct: 1 MKSLSSYNPTVWTYMSSITKSVMQPLGVAILSVVLILEFSKMAKKIANSGGAMTFEALAP 60 

Query: 97 MIVSYIMVAWITNTTVIVEAIIAIA 122 
M++SYIMVAWITNTTVIVEAII IA 
10 Sbjct: 61 MLISYIMVAWITNTTVIVEAIIGIA 86 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

15 Example 1894 

A DNA sequence (GBSx2002) was identified in S.agalactiae <SEQ ID 5881> which encodes the amino 
acid sequence <SEQ ID 5882>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -7.54 Transmembrane 32 - 48 ( 25 - 52) 

INTEGRAL Likelihood = -4.09 Transmembrane 63 - 79 ( 62 - 80) 

Final Results 

bacterial membrane Certainty=0. 4015 (Affirmative) < suco 

25 bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9933> which encodes amino acid sequence <SEQ ID 9934> 
was also identified. A related GBS nucleic acid sequence <SEQ ID 10777> which encodes amino acid 
30 sequence <SEQ ID 1 0778> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

35 Example 1895 

A DNA sequence (GBSx2003) was identified in S.agalactiae <SEQ ID 5883> which encodes the amino 
acid sequence <SEQ ID 5884>. This protein is predicted to be TrsK-like protein (traK). Analysis of this 
protein sequence reveals the following: 

Possible site: 34 
40 »> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.38 Transmembrane 66 - 82 ( 62 - 85) 

Final Results 

bacterial membrane Certainty=0 .3 951 (Affirmative) < suco 

45 bacterial outside Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG38037 GB:AF295925 Orf21 [Streptococcus pneumoniae] 
50 Identities = 343/457 (75%), Positives = 385/457 (84%), Gaps = 24/457 (5%) 
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Query: 


142 


Sbjct: 


1 


Query: 


202 


Sbjct: 


ei 


Query: 


262 


Sbjct: 


121 


Query: 


3 01 


Sbjct: 


180 


Query: 


361 


Sbjct: 


240 


Query: 


419 


Sbjct: 


300 


Query, 


479 


Sbjct: 


360 


Query: 


539 


Sbjct: 


420 



+ VIGGSG+GKTFRFVKPNLIQ+N SNIWDPKDHLAEKTGKLFLE+GYQVKVLDLVNM 
MAVIGGSGSGKTFREVKPNLIQMNSSNlVVDPKDHIifiEKTGKXPliEHGYQVKVLDLVNMK 6 



NSDGFNPFRY+ETENDLNRML VYFNNTKB+GSRSDPFWDEASMTLVRA+ASyLVDFXWP 



K+E E R+KRGR P E 



+LE+LFE+YAKKYG ENFTMRNWADFQNYKDKTLDSVIAVTTAKFALFNIQSV+DLT+RD 



T+D+KTWG +K+MVYLVIPDND+TFRFLSAL FF+ F T 



YLDEFAN+GEIPDFAEQTSTVRSRNMSLVPILQNIAQLQGLYKEKEAWKTILGNCDSL+Y 



LGGNDE+TFKFMSGLLGKQT+DVR+TSRSFGQTGS S SHQKIARDLMT E 



CLVRIA +PVF++KKY KH +WK LA++ETD+R Vi 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8929> and protein <SEQ ID 8930> were also identified. Analysis of ft 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 5.53 
GvH: Signal Score (-7.5): -0.78 

Possible site: 34 
>» Seems to have a cleavable N-term signal seq. 

ALOM program count: 1 value: -7.38 threshold: 0.0 , 
INTEGRAL Likelihood = -7.38 Transmembrane 66 - 82 ( 62 - 85) 
PERIPHERAL Likelihood = 1.75 338 
. modified ALOM score : 1.98 

*** Reasoning Step: 3 

Final Results 

bacterial membrane --- Certainty=0.39Sl (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Mot Clear) < suco 

The protein has homology with the following sequences in the databases: 
33.9/50.9% over 419aa 

Lactococcus lactis 

GP|3582206| trsK protein (traK) Insert characterized 

PIR|T43089jT43089 transfer complex protein TrsK - plasmid pMRCOl Insert characterized 
ORF00383S715 - 2004 of 2415) 
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GP|3582206 |gb | AAC56002 . 1 1 |AE001272 (23 - 442 of 530) trsK protein (traK) {Lactococcus 
lactis}PIR|T43089|T43089 transfer complex prote 
in TrsK - Lactococcus lactis plasmid pMRCOl 
%Match =10.1 
5 %Identity =33.8 %Similarity =50.8 

Matches = 141 Mismatches = 193 Conservative Sub.s = 71 



SFLAFILGVLMMTLVYLWSTGQKVYREGEEYGSARFGTSKE^ 

I : hh = 

MNGTILGVLDNKI IYQDNTTKPNRNVM 



VIGGSGAGKTFRFVKPNLIQLNCSNIW-DPKDHIJffiKTGKLFLENGYQv^ 

llllll: II I II -III III I III : I IN I::: II =11 =111 1 = 1 = 

VIGGSGSYKTQSWITNLFNETHtfSIVVTDPKGELYEKTAGIK^ 

40 50 60 70 80 90 100 

996 1026 1056 1086 1116 1146 1176 1194 

TWFNOTKGNGSRSDPFWDFASMTLVRAIASYLVDFYNPPGSSKQEQEARRKRGRYPAFSEIGKLIKLLSKGD NQD 

I : ) : I |::|: ::: -hi I hi 

TKIVQSENAEGKK- -DVWFSTQRQIjLKALILFVM KERSPEQRNLAGVINVLQTFDSEPINKD 

120 130 140 150 160 

1221 1251 1281 1311 1341 1371 1401 1431 

K-SILEVLFEDYAKKYGHENFTMR1WADFQNYKDKTLDSVIAVTTAKFALFNIQSVIDLTQRDTMDLKTWGTQKTMVYLV 
: I |: II I I I hi hh | : | : I = = | h I =1 = = l = = 

ENSDLDNLF - - LALKITHPARIAYELG- FKKAKGDMKAS IIS SLLATI SKFTDEE VSNFTS I SDFHLQDIGRKKI VLYVI 
180 190 200 210 220 230 240 



1461 1491 1521 1551 1581 1611 1641 1671 

IPDNDTTFRFLSAI.KFSTVFSTLTRQADVDFKGQLPIHVRSYLDEFANVGEIPDFAEQTSTVRSRNMSLVPILQNIAQLQ 

II 11= Mil =11 = 1= =111 Mil hh I = I =1 I ■■ ■■ I I = III 

35 IPVMDNTYESFI^FFSQMFDELYIOASSN-GAKLPQETOFILDEFVNLGKFPXYEEFLATCRGYGIGVTTICQTLTQLQ 
260 270 280 290 300 310 320 



1701 1731 1761 1791 1809 1839 1869 1899 

GLYKEKEAWKT I LGNCDSLLYLGGNDEETFKFMSGLLGKQTVD VR STSRSFGQTGSSSTSHQKIARDLMTADEVGT 

40 || |1 : | ==| I |: I 1111 11 I III 1 : 1 I h =1 111 11= 

SLY-GKEKAESILGNHAVKICmASNEATAKYFSELLGKSWKVETGSESTSHSKETSTSKSDSYSYTSRQLMTPDEIIR 
340 350 360 370 380 390 400 

1929 1956 1974 2004 2034 2064 2094 2124 

45 MKRDECLVRIAGV- PVFRTK KYFPLKHKHWKLIADKETODRWJNYHINPLAKEEELDLSDYQIRDLSTETSLH**K 

I : h h II 1 II II =1 

MPDTQSLLIFTNQKPIKATKAFQFKLFPDADSKVKLEQNKYVGIT^ 

420 430 440 450 460 470 480 

SEQ ID 5884 (GBSlld) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
50 extract is shown in Figure 151 (lane 6; MW 61kDa) and in Figure 182 (lane 10; MW 61kDa). It was also 
expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 12 
(lane 5; MW 91.5kDa). 

Example 1896 

A DNA sequence (GBSx2004) was identified in S.agalactiae <SEQ ID 5885> which encodes the amino 
55 acid sequence <SEQ ID 5886>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have no N-terminal signal sequence 



Final Results 

60 bacterial cytoplasm Certainty=0. 4192 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9935> which encodes amino acid sequence <SEQ ID 9936> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1897 

A DNA sequence (GBSx2005) was identified in S.agalactiae <SEQ ID 5887> which encodes the amino 
acid sequence <SEQ ID 5888>. Analysis of this protein sequence reveals the following: 

> N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3391 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1898 

A DNA sequence (GBSx2006) was identified in S.agalactiae <SEQ ID 5889> which encodes the amino 
acid sequence <SEQ ID 5890>. Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 



integral Likelihood = 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 

INTEGRAL Likelihood = - 



Transmembrane 
Transmembrane 

73 Transmembrane 106 - 122 

46 Transmembrane 6 - 22 

13 Transmembrane 154 - 170 

S3 Transmembrane 180 - 196 



105 - 123 



180 - 196) 



Final Results 

bacterial membrane Certainty=0 . 5012 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9937> which encodes amino acid sequence <SEQ ID 9938> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA11325 GB:D78257 0RF8 [Enterococcus faecalis] 
Identities = 35/102 (34%) , Positives = 57/102 (55%) , Gaps = 4/102 (3%) 

Query: 90 TRNQAVLVQVGKQVPPIIFLLFL-VNASILEEIVyRQLLWEKLTF--PFEQIGVTSFLFV 146 

T N + L+++ V P++ +L L + A I+EEIV+R + L I ++SFLF 

Sbjct: 7 TANDSTLI KLFSGVSPVLWLLLGIAAPIMEEIVFRGGI IGYLVENNALLAI LI SS FLFG 66 
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Query: 147 LSHGPNQLGSWLIYSCLGL'J 

+ HGP S+ +Y +G+ L+V 
Sbjct: 67 IIHGPTNFISFGMYFFMGIILSV; 



DCMTAIALHLLWN 187 
KT D +I++H L N 
KTKDLRVSISIHFLNN 108 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 893 1> and protein <SEQ ID 8932> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Discrira Score: 9.32 
GvH: Signal Score (-7.5) : -5.41 

Possible site: 45 
>>> Seems to have an uucleavable N-term signal seq 
ALOM program count: 6 value: -10.03 threshold: 0.0 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =-10.03 
Likelihood = -7.06 
Likelihood = -5.73 
Likelihood = -4.46 
Likelihood = -2.13 
Likelihood = -0.53 
PERIPHERAL Likelihood = 1.38 
modified ALOM score: 2.51 



• Reasoning Step: 3 

■-- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



Transmembrane 68 - 84 

Transmembrane 33 - 49 

Transmembrane 105 - 122 

Transmembrane 6 - 22 

Transmembrane 154 - 170 

Transmembrane 180 - 196 
131 



154 - 170) 



- Certainty=0. 5012 (Affirmative) 

• Certainty=0 . 0000 (Not Clear) < 

• Certainty=0 . 0000 (Not Clear) < 



The protein has homology with the following sequences in the databases: 

ORF01326(568 - 861 of 1188) 

EGAD|l4826l| 158156 (7 - 108 of 120) hypothetical protein 
GP| 1402529|dbj |BAA11325.l| |D78257 0RF8 {Enterococcus faecalis} 
%Match =5.9 

%Identity =34.7 %Similarity = 60.4 

Matches = 35 Mismatches = 37 Conservative Sub.s = 26 



303 



333 



3S3 



393 



423 



453 



483 



Y*L*RFI*EVTMIRIvLFYIAIQLNGLLVSLFLKEYLTIEGIVLLQLVLLSVTCLEIARHKTVPLKIVGVQNRLSWLLLG 



FVAMVAFAVFISFLFPVQTRNOAVLVQVGKQVPPIIFLLFL-VNASILSEIVYRQLLWEKLT--FP-FEQIGVTSFLFVLS 
I I = I::: I h= =1=1 = I hlllhi =1 :. I = = 1111 = 

MQGHTTTANDSTLIKLFSGVSPVLWLLLGIAAPIMEEIVFRGGIIGYLVENNALLAILISSFLFGII 



HGPNQIi3SWLIYSCLGLTIiAvWLKT-DCMTAIALHLLWNSIAYVVTFL*YQNQECFRIMEAPYV**GIEKRGGHYVI*T 
III : h =1 =1= hi II I =l = = l = l I 
HGPTNFISFGMYFFMGI ILSVSYYKTKDLRVS I S IHFLNNLFPAI AIAYGLI 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1899 

A DNA sequence (GBSx2007) was identified in S.agalactiae <SEQ ID 5891> which encodes the amino 
acid sequence <SEQ ID 5892>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
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>>> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0 .2490 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9939> which encodes amino acid sequence <SEQ ID 9940> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1900 

A DNA sequence (GBSx2008) was identified in S.agalactiae <SEQ ID 5893> which encodes the amino 
acid sequence <SEQ ID 5894>. Analysis of this protein sequence reveals the following: 

;erminal signal sequence 



iO Final Results 

bacterial cytoplasm Certainty=0. 5298 (Affirmative) < succ: 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MNLLHKKSILDCTELEERIHQAETNQLLQKILSLPNFDCDFEVTFEDDYHfCEMNDPLFYE SO 

M L+K+SILDC ELE +H AE QL ++I +PN+ C+FEVTF DDYHK+ N PLFYE 
Sbjct: 1 MKALNKESILDCDELETELHDAEIKQLDEQIFLMPNYPCEFEVTFLDDYHKKHNYPLFYE 60 

Query: 61 SNLHQISDFMETRDIKNGVDTLLTKDNHLAFPAFGENYSARGKEGILTTLVTVKCFGEGR 120 

S L I +F+E++DIKNG D + +L F +G+ Y A GKEGILTT VTVK F E + 
Sbjct: 61 SYLQNIMEFLESQDIKNGADAFVDDHQNLVFVLYGQGYPAEGKEGILTTQVTVKAFDEDK 120 

Query: 121 MP1DMS 126 
P1+ + 

Sbjct: 121 KPINFA 126 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1901 

A DNA sequence (GBSx2009) was identified in S.agalactiae <SEQ ID 5895> which encodes the amino 
acid sequence <SEQ ID 5896>. This protein is predicted to be methyl transferase. Analysis of this protein 
sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 
Final Results 



WO 02/34771 



PCT/GB01/04789 



-2147- 

bacterial cytoplasm Certainty=0 .1209 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98421 GB:L29323 methyl transferase [Streptococcus pneumoniae] 
Identities = 323/449 (71%) , Positives = 389/449 (85%) , Gaps = 3/449 (0%) 

Query: 1 MKFLDLFAGIGGFRLGMESQGHKCLGFCEIDKPARTSYICAMFNTEGEIEYHDIKEVTDHD 60 
10 M+F+DLF+GIGGFRLGMES GH+C+GFCEIDKFAR SYK++F TEGEIE+HDI++V+D + 

Sbjct: 1 MRFIDLFSGIGGFRLGMESVGHECIGFCEIDKFARESYKBIFQTEGEIEFHDIRDVSDDE 60 

Query: 61 FRQFRGQVDIICGGFPCQAFSLAGRRLGFEDTRGTLFFEIARRAKQIQPRFLFLENVKGL 120 
F++ RG+VD+1CGGFPCQAFS+AGRRLGFEDTRGTLFFEIARAAKQIQPRFLFLENVKGL 
15 Sbjct: 61 FKKLRGKVDVICGGFPCQAFSIAGRRLGFEDTRGTLFFEIARAAKQIQPRFLFLENVKGL 120 



Query: 121 USMDEGRTFATILSTLDELGYDVEWQVLNSKDFQVPQNRERVFIIGHSRRYRSRFIFPLR 180 

LNHD+GRTF TIL+TLDELG+DVEWQ+LNSKDF VPQNRERVFI IGHSR+ +R FP R 
Sbjct: 121 LNHDKGRTFTTILTTLDELGFDVEWQMLNSKDFGVPQNRERVFIIGHSRKRGTRLGFPFR 180 

Query: 181 RED- --SPAHLERLGNINPSKHGLNGEVYLTSGIAPTLTRGKGEGAKIAIPVLTPDRLEK 237 

RE +P L+ LGN+NPSK G++G+VY + GIAPTL RGKGEG KIAIP +TPDRL+K 
Sbjct: 181 REGQATNPETLKILGNLNPSKSGMSGKVYYSEGLAPTLVRGKGEGFKIAIPCMTPDRLDK 240 

Query: 238 RQHGRRFKDNQDPMFTLTSQDKHGVWAGNLPTSFDQTGRVFDISGLSPTLTTMQGGDKV 297 

RQ+GRRFKDNQ+ PMFTL +QD+HG+W G+LPTSF +TGRV+ GLSPTLTTMQGGDK+ 
Sbjct: 241 RQNGRRFKDNQEPMFTLNTQDRHGIVWGDLPTSFKETGRVYGSEGLSPTLTTMQGGDKI 300 

Query: 298 PKILLREmPFLKIKEATKTGYAKATI/SDSVNIAYPDSTKRRGRVGKGISm'LTTSnNMG 357 

PKIL+ E + FLK++EATK QYA+A +GDS+NL P S RRGRVGKGI +NTLTTS MG 
Sbjct: 301 PKI L I PEP I QFLKVREATKKGYAQAEIGDS INIjERPSSQHRRGRVGKGI ANTLTTSGQMG 360 



Query: 358 VWAAt,EYRQDKWYEVTGIVLEGKLYRLRIRRLTPRECFRLQGFPDWAYERAESVSSKSQ 417 

VWA+ E + Y+V G++++G+ YRLRIRR+TP+ECFRLQGFPDWA+E A VSS SQ 
Sbjct: 361 WVASYEGEDKQVYQVAGVLIDGQFYRIjRIRRITPKECFRLQGFPDWAFEAARKVSSNSQ 420 

Query: 418 LYKQAGNSVTVTVIEAIAREFRRTEEEEK 446 

LYKQAGNSVTV VI AIA+4 + EE+++ 
Sbjct: 421 LYKQAGNSVTVPVIAAIAKKLKEVEEKDE 449 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2435> which encodes the e 
sequence <SEQ ID 2436>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 1725 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 60/75 (80%) , Positives = 69/75 (92%) 



Query: 1 MKFLDIiFAGIGGFRLGMESQGHKCLGFCEIDKFARTSYKAIIFNTEGEIEYHDIKEVTDHD 60 
55 MKFLDLFAGIGGFRLG+ +Q H+C+GFCEIDKFAR SYKA-I-+ TEGEIE+HDI ++VTD D 

Sbjct: 4 MKFLDIiFAGIGGFRLGLINQCHECIGFCEIDKFARQSYKAIYETEGEIEFHDIRQVTDQD 63 

Query: 61 FRQFRGQVDIICGGF 75 
FRQ RGQVDIICGGF 
60 Sbjct: 64 FRQLRGQVDI ICGGF 78 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1902 

A DNA sequence (GBSx2010) was identified in S.agalactiae <SEQ ID 5897> which encodes the amino 
acid sequence <SEQ ID 5898>. Analysis of this protein sequence reveals the following: 

Possible site: IS 
5 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -9.71 Transmembrane 8 - 24 ( 3 - 30) 

Final Results 

bacterial membrane --- Certainty=0 .4885 (Affirmative) < suco 

10 bacterial outside --- Certaxnty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9941> which encodes amino acid sequence <SEQ ID 9942> 
was also identified. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5899> which encodes the amino acid 
sequence <SEQ ID 5900>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -1.81 Transmembrane 20 - 36 ( 19 - 36) 

Final Results 

bacterial membrane Certainty=0. 1723 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 16/33 (48%) , Positives = 26/33 (78%) 

30 Query. 1 MNKMIWWILGGIYLISIIILIVEIIRAPEMDDH 33 

++KM WW+L G++ + I LI+E+I APEM+D+ 
Sbjct: 12 VSKMFWWLLLGVWGLRTIWLI IEVITAPEMEDY 44 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 1903 

A DNA sequence (GBSx2011) was identified in S.agalactiae <SEQ ID 5901> which encodes the amino 
acid sequence <SEQ ID 5902>. This protein is predicted to be ifh-response binding factor 1 (irebf-1). 
Analysis of this protein sequence reveals the following: 

40 Possible site: 53 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4771 (Affirmative) < suco 

45 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD41248 GB:AF105927 unknown [Streptococcus suis] 
50 Identities = 258/272 (94%) , Positives = 266/272 (96%) 

Query: 1 MKRITANQYQTSERYYKLPKILFESERYKDMKLEVKVAYAVLKDRLELSLSKGWIDEDGA 60 
MKRITANQYQTSERYYKLPKILFESERYKDMKLEVKVAYA\?LKDRLELSLSKGWIDEDGA 
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Sbjct: 1 MKRITANQYQTSERYYKLPKILFESERYKDMKLEVKVAYAVLKDRLELSLSKQWIDEDGA 60 

Query: 61 IYLIYSNSNLMALLGCSKSKLLSIKKTLREYGLIDEVQQSSSERGRMftHKIYLGELEHEP 120 

IYLIYSNSNLMALLGCSKSKLLSIKKTLREYGLIDEVQQSSSE+GRMANKIYLGELEHE 
Sbjct: 61 IYLIYSNSNLMALLGCSKSKLLSIKKTLREYGLIDEVQQSSSEKGRMANKIYLGELEHET 120 

Query: 121 TPVLHTDGASVKKTLGESQRKTGPVLYSAPSETEGSETKYSETEGSDLVMKDEEERQLVD 180 

TPVLHTDGASVKKTLG SQRKTGPVL SAPSETEGSETKYSET+GSD +++DEEERQ VD 
Sbjct: 121 TPVLHTDGASVKKTLGGSQRKTGPVLNSAPSETEGSETKYSETKGSDFLIEDEEERQQVD 180 

Query: 181 EKKEENFTSKVDGVTKYDRDYIWGLVKDQLRQTGLSQSASDYAMIYFSDRYQYALEQMRF 240 

EK+EENFTSKVDGVT+YDRDYIWGLVHDQLRQTGLSQSASDYAMIYFSDRYQYALE MRF 
Sbjct: 181 EKQEENFTSKVDGVTRYDRDYIWGLVHDQLRQTGLSQSASDYAMIYFSDRYQYALEHMRF 240 

Query: 241 ARSAEVIAEYVFNGVLSEWTKQLRRQEVKGGE 272 

ARSAEVIAEYVFNGVLSEWTKQLRRQEVKGG+ 
Sbjct: 241 ARSAEVIAEYVFNGVLSEWTKQLRRQEVKGGD 272 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5903> which encodes the amino acid 
sequence <SEQ ID 5904>. Analysis of this protein sequence reveals the following: 



• Final Results 

bacterial cytoplasm Certainty=0 . 5248 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < auco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 84/122 (68%), Positives = 99/122 (80%), Gaps = 2/122 (1%) 

Query: 145 VLYSAPSETEGSETKYSSTEGSDLVMKDEEERQLVD- -EKKEENFTSKVDGVTKYDRDYI 202 

VL SAPSETE SET+ SET+ S+LV++DEEER+ +K E +FT +VD VTKYD+DYI 

Sbjct: 1 VLNSAPSETEKSETEGSETKESNLVIEDEEERKECTSVKKTEGHFTRQVDQVTKYDKDYI 60 

Query: 203 WGLVHDQLRQTGLSQSASDYAMIYFSDRYQYALEQMRFARSAEVIAEYVFNGVLSEWTKQ 262 

W LVH QLR+ GLSQ+ASD M YF +RY YALE +RFAR+AE IAEYVFNGVLSEWTKQ 
Sbjct: 61 WSLVHSQLREGGLSQAASDLVMSYFEERYAYALEHIRFARTAEAIAEYVFNGVLSEWTKQ 120 

Query: 263 LR 264 
LR 

Sbjct: 121 LR 122 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 1904 

A DNA sequence (GBSx2012) was identified in S.agalactiae <SEQ ID 5905> which encodes the amino 
acid sequence <SEQ ID 5906>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
50 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 4191 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=o . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9375> which encodes amino acid sequence <SEQ ID 9376> 
was also identified. 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1905 

A DNA sequence (GBSx2013) was identified in S.agalactiae <SEQ ID 5907> which encodes the amino 
acid sequence <SEQ ID 5908>. Analysis of this protein sequence reveals the following: 

I-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .3723 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1906 

A DNA sequence (GBSx2014) was identified in S.agalactiae <SEQ ID 5909> which encodes the amino 
acid sequence <SEQ ID 5910>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3053 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1907 

A DNA sequence (GBSx2015) was identified in S.agalactiae <SEQ ID 591 1> which encodes the amino 
acid sequence <SEQ ID 5912>. This protein is predicted to be 5 OS ribosomal protein L7/112 (rplL). 
Analysis of this protein sequence reveals the following: 

■> N- terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 1034 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 9943> which encodes amino acid sequence <SEQ ID 9944> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11881 GB:Z991Q4 ribosomal protein L12 (BL9) [Bacillus subtilis] 
Identities = 83/123 (67%) , Positives = 95/123 (76%) , Gaps = 2/123 (1%) 

Query: 6 MMIEKIIAEIKEATII^I^LVI<AIEEEFGVTAAAPVAAA--AAGGE3U^KDSFDVE 63 

MALNIE IIA 4-KEAT+LELNDLVKAIKEEFGVTAAAPVA 1 MSffl 4- FD4 
Sbjct: 1 MAMIEEIlASVimTVIiEIMJLVKAIEEEFGVTAAAPVAVAGGAAAGGAAEEQSEFDLl 60 

Query: 64 LTAAGDiaWGVIKVTOEITGEGLKEAK?.ITONAPSVIKEGASFAEANEIKEKLEAAGASV 123 

L AG +K+ VIKWREITG GLKEAK +VDW P +KEG 4-4- EA E+K KLE GASV 
Sbjct: 61 IAGAGSQKIKVIKVVREITGIX3LKEAKELVDOTPKPLKEGIAKEE^ELK?iKLEEVGASV 120 

Query: 124 ThK 126 
+K 

Sbjct: 121 EVK 123 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5913> which encodes the amino acid 
sequence <SEQ ID 5914>. Analysis of this protein sequence reveals the following: 

io N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1164 (Affirmative) < suco 

bacterial membrane — Certaxnty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) <: suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 104/126 (82%) , Positives = 113/126 (89%) 

Sbjct 



Query: 



Query: 
Sbjct 



1 MSEITMAI^IENIIAEIKi^TILECJ^DLVKAIEEEFG^/T^APVAAAAAGGEAAAAKDSF 60 

4-EEITMALNIENI IAEIKEA-HLEOTOLVKAIEEEFGVTAAAPVAftAAAGG AAKDSF 
1 LEEITMfeimENIIAEII^SIuELM>LVKM^ SO 

61 DVELTAAGDKKVGVIKyyREITGEGLKEAKAIVD^ 120 

DVELT+AGDKKVGVIK VREITG GLKEAK +VD AP+ 4-KEG + AEA EIK KLE AG 
61 DVELTSAGDKKVGVIKATOEITGLGLKEAKGLVDGAPANVKEGVAA?4EAEEIKAKLEEAG 120 

121 ASVTLK 126 

A4-4-TLK 
121 ATITLK 126 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1908 

A DNA sequence (GBSx2017) was identified in S.agalactiae <SEQ ID 5915> which encodes the amino 
acid sequence <SEQ ID 5916>. This protein is predicted to be ribosomal protein L10 (rplJ). Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 1251 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Hot Clear) < suco 
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bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 14 MSEAI IAKKAEQVELIAEKMKAAASIW\T3SRGLTVEQDTNLPJ?SLRESDVEFKVIKNSI 73 

MS AI KK VE IA K+K + S ++VD RGL V + T LR+ LRE++VE KV KN++ 
Sbjct: 1 MSSAIETKKW-VEEIASKLKESKSTIIVDYRGIlWSEVTELRKQLREai^SKVYKlCTM 59 

Query: 74 LTRAAEKAGLEDLKELFVGPSAVAFSNEDVIAPAKVISDFAKDAEALEIKGGSVDGKFTS 133 

RA E+A L L + GP+A+AFS EDV+APAKV++DFAK+ EALEIK G ++GK ++ 
Sbjct: 60 TRRAVEQAEIMGUmFLTGPNaiAFSTEDWAPAKOTJTOFAKNHEALEIKAGVIEGKVST 119 

Query: 134 VEEINALAKLPNICEGMLSMLIiSVLQAPVRNVAYAVKAVAEKDEE 177 

VEE+ ALA+LP +EG+LSMLLSVL+APVRN+A A KAVAE+ EE 
Sbjct: 120 VEEVKALAELPPREGLLSMLLSVLKAP\?RNLALAAKAVAEQKEE 163 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5917> which encodes the amino acid 
sequence <SEQ ID 5918>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.47 Transmembrane 7 - 23 ( 5 - 24) 

Final Results : 

bacterial membrane Certainty=0. 3187 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities - 149/176 (84%) , Positives = 162/176 (91%) 

Query: 4 SQKIKTEVKLMSEailAKKAEQVELIAEmKAAASlWVDSRGLTVEQDTNLRRSLRESD 63 

S KIKTEVKLMSEAIIAKKAEQVELIAEKMKAAASIV+VDSRGLTV+QDT LRRSLRES 
Sbjct: 23 SPKIKTEVKLMSEAIIAKKAEQVELIAEKMKAAASIVIVDSRGLTVDQDTVLRRSLRESG 82 

Query: 64 VEFKVIKNSILTRAAEKAGLEDLKELFVGPSAVAFSNEDVIAPAKVISDFAKDAEALEIK 123 

VEFKVIKNSILTRAAEKAGL++LK++FVGPSAVAFSNEDVIAPAKVI+DF K A+ALEIK 
Sbjct: 83 VEFKVIKNSILTRAAEKAGLDELKDVFVGPSAVAFSNEDVIAPAKVINDFTKTADALEIK 142 

Query: 124 GGSvTJGKFTSVEEINAIAKLPNKEGMLSMLLSVLQAPVRNVAYAVKAVAEKDEEVA 179 

GG+++G +S EE I ALA LPN+EGMLSMLLSVLQAPVRNVAYAVKAVAE E A 
Sbjct: 143 GGAIEGAVSSKEEIQALATLPHREGMLSMrjLSVLQAPVRNVAYAVKAVAENKEGAA 198 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1909 

A DNA sequence (GBSx2018) was identified in S.agalactiae <SEQ ID 5919> which encodes the amino 
acid sequence <SEQ ID 5920>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.22 Transmembrane 125 - 141 ( 105 - 143) 
INTEGRAL Likelihood = -1.91 Transmembrane 108 - 124 ( 106 - 124) 

Final Results 

bacterial membrane Certainty=0 .3888 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 1093 1> which encodes amino acid sequence <SEQ ID 
10932> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes, 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1910 

A DNA sequence (GBSx2019) was identified in S.agalactiae <SEQ ID 5921> which encodes the amino 
acid sequence <SEQ ID 5922>. This protein is predicted to be Clp-like ATP-dependent protease binding 
subunit (clpC). Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3483 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA68910 GB:L34677 Clp-like ATP-dependent protease binding 
subunit [Bos taurus] 
Identities = 437/589 (74%), Positives = 514/589 (87%), Gaps = 5/589 (0%) 

Query: 10 DPFGN-MDDIFNSLMGNMGGYNSENKRYLINGREVTPEEFSQYRQTGKLPGQELNNQNTP 68 

DPF N MDD+FN LMG M G NSEN+RYLINGREVTPEE++ +RQTGKLPG Q 
Sbjct: 2 DPFNNDMDDLFNQLMGGMNGWSENRRYLINGREOTPEEYAAFRQTGKLPGVTDPTQ-AK 60 

Query: 69 TNQVSADSVLTKLGTNLTDQARQHLLDPVIGPJSIKEIQETAEIIARRTKNNPVLVGDAGVG 128 

T Q DS+L KLG NLT +A++ LDPVTGRNKEIQETAEIIj+RRTKNNPVLVGDAGVG 
Sbjct: 61 TKQPQPDSMLAKLGRNLTQEAKEGKLDPVIGRNKEIQETAEILSRRTKNNPVLVGDAGVG 120 

Query: 129 KTAVIEGLAQAIINGDVPAAIKNKEIISIDISSLEAGTQYRGSFEENIQNIIKEVKETGN 188 

KTAV+EGLAQAI+ GDVPAAIKNK+IISIDISSLEAGTQYRGSFEEN+Q +1 EVK+ GN 
Sbjct: 121 KTAWEGLAQAIVAGDVPAAIKNKQIISIDISSLEAGTQYRGSFEENMQKLIDEVKKDGN 180 

Query: 189 IILFFDEIHQILGAGSTGGDSGSKGIiADILKPALSRGELTVIGATTQDEYRNTILKNAAL 248 

+ILFFDEIHQI+GAG+ G SGSKG+ADILKPALSRGE+T+IGATTQDEYRNTILK+AAL 
Sbjct: 181 VILFFDEIHQIIGAGNAGDASGSKGMADILKPALSRGEVTLIGATTQDEYRNTILKDAAL 240 

Query: 249 ARRFK1EVKVNAPSAQDTFNILMGIRNLYEQHHNWLPDSVLKAAVDLSIQYIPQRSLPDK 308 

+RRFN+V VNAPS +DTF IL G+R LYE+HHNV LPD VLKAA+D S+QYIPQRSLPDK 
Sbjct: 241 SRRFNQVTVNAPSKEDTFKILQGLR:<LYEP<HHN\rSLPDEVLICAAIDYSVQYIPQRSLPDK 300 

Query: 309 AIDLIDMTAAHLARQHPVTDLKSLEKEIAEQRDKQEKAVNTEDFEEALKVKTRIEELQNQ 368 

AIDLID+TAAHLA++HPV D K++E+EI + KQ++AV ED++ A + K ++ +LQ+Q 
Sbjct: 301 AIDLIDWAAHLASKHPVKDAKTIEEEIKKTEAKQQEAVEKEDYQAAQEAKDQVAKLQDQ 360 

Query: 369 IDNHTEGQKVTATINDIAMSIERLTGVPVSNMGASDIERLKELGNRLKGKVIGQNDAVEA 428 

+ +H+E ++V AT +D+A. ++ER+TG+PVS MGASDIERLK L RL+GKVIGQ +AVEA 
Sbjct: 361 LKDHSESERWATPSDVAAAVERMTGIPVSKMGASDIERLKGLATRLEGKV'IGQQEAVEA 420 

Query: 429 VARAIRRNRAGFDDGNRPIGSFLFVGPTGVGKTE1AKQ1AFDMFGSKDAIVRLDMSEYND 488 

V+RAIFJO^GFD+GNRPIGSFLFVGPTGVGKTEIAKQIiA DMFGS + I+RLDMSEY D 
Sbjct: 421 VSRAIRRNRAGFDEGNRPIGSFLBVGPTGVGKTEI^QLALDMFGSTNDIIRLDMSEYTD 480 



Query: 489 RTAVSKLIGATAGYVGYDDNSNTLTERIRRNPYSIVLLDEIEKADPQVITLLLQVLDDGR 548 
RTAVSKLIG TAGYVGYDDNSNTLTE++RR+PYSIVLLDEIEKA+PQVITI.LLQVLDDGR 
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Sbjct: 481 RTAVSKLIGTTAGWGYDDNSOTTiTEKVRRHPYSIVLLDEIEKANPQVITLLLQVLDDGR 540 

Query: 549 LTDGQGNTIKFKNTVIIATSKAGFGNEAFTGDSDKDLKIMERISPYFRP 597 
LTDGQGNT+ + FKNT+ 1 I ATSNAGF ++A G+ D K+M+++ PYFRP 
5 Sbjct: 541 LTDGQGNTVDFKNTIIIATSNAGFSSDAVAGE DAKLMDKLQPYFRP 586 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5923> which encodes the amino acid 
sequence <SEQ ID 5924>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
10 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2718 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 551/697 (79%) , Positives = 616/697 (88%) , Gaps = 3/697 (0%) 

Query: 5 NFYNRDPFGNMDDIFNSLMGNMGGYNSENKRYLINGREVTPEEFSQYRQTGKLPGQELNN 64 

+F +DPF NMDDIFN LM NMGGY SEN RYL+NGRE+TPEEF YRQTG+LP 
Sbjct: 3 HFSGKDPFVNMDDIFNQLMANMGGYRSENPRYLWGREITPEEFQHYRQTGQLPVATTKA 62 

Query: 65 QNTPTNQVSADSVLTKLGTNLTDQARQHLLDPVIGRNKEIQETAEILARRTKNNPVLVGD 124 

N+ ADSVLT4LGTNLT +ARQ LDPVIGRNKE I Q+TAE 1 LARRTKNNPVLVGD 

Sbjct: 63 TNSQMLTPKADSVLTQLGTNLTQEARQGHLDPVIGRNKEIQDTAEILARRTKNNPVLVGD 122 

Query: 125 AGVGKTAVIEGLAQAIINGDVPAAIKNKEIISIDISSLEAGTQYRGSFEENIQNIIKEVK 184 

AGVGKTAVIEGLAQAI+NGDVPAAIKNKEI+SIDISSljEAGTQYRGSFEE IQN+I+EVK 
Sbjct: 123 AGVGKTAVIEGIAQAIVNGDVPAAIKNKEIVSIDISSLEAGTQYRGSFEETIQNLIQEVK 182 

Query: 185 ETGNIILFFDEIHQILGAGSTGGDSGSKGLADILKPALSRGELTVIGATTQDEYRNTILK 244 

E GNIILFFDEIHQI+GAG+T DSGSKGLADILKPALSRGELT+ IGATTQDEYRNTILK 
Sbjct: 183 EAGNIILFFDEIHQIVGAGATSSDSGSKGLADILKPALSRGELTLIGATTQDEYRNTILK 242 

Query: 245 NAAIARRFNEVKWAPSAQDTFKILMGIRNLYEQHHNVVTjPDSVLKARVDJjSIQYIPQRS 304 

NAALARRFNEVKVNAPSA+DTF+ILMGIRNLYEQHH+ + LPD+VLKAAVD SIQYIPQRS 
Sbjct: 243 NAAIARRFNEVKVNAPSAEDTFHILMGIRNLYEQHHHITLPDNVLKAAVDYSIQYIPQRS 3 02 

Query: 305 LPDKAIDLlDMTAMIIAAQHPVTDLKSLEKEIAEQRDKQEKAvNTEDFEEALKVKTRIEE 364 

LPDKAIDL+DMTAMJuAAQHPVTDLK+LE EIA+Q++ QEKAV EDFE+AL KTRIE 
Sbjct: 303 LPDKAIDLLDMTARHIAAQHPWDLKTLETEIAKQKESQEKAVAKEDFEKALAAKTRIET 362 

Query: 365 LQNQIDNHTEGQKVTATINDIAMSIERLTGVPVSNMGASDIERLKELGNRLKGKVIGQND 424 

LQ QI+ H + Q VTAT+NDIA S+ERLTG+PVSNMG +D+ERLK + +RLK VIGQ++ 
Sbjct: 363 LQKQIEQHNQSQNVTATVNDIAESVERLTGIPVSNMGTNDLERLKGISSRLKSHVIGQDE 422 

Query: 425 AVEAVARAIRRNRAGFDDGNRPIGSFLFVGPTGVGKTEIAKQLAFDMFGSKDAIVRLDMS 484 

AV AVARAIRRNRAGFDDG RPIGSFLF\ r GPTGVGKTEIAKQLA D+FGSKDAI+RLDMS 
Sbjct: 423 AVAAVARAIRRNRAGFDDGKRPIGSFLFVGPTGVGKTEIAKQLALDLFGSKDAIIRLDMS 482 

Query: 485 EYNDRTAVSK1IGATAGYVGYDDNSNTLTERIRRNPYSIVLLDEIEKADPQVITLLLQVL 544 

EYNDRTAVSKLIG TAGYVGYDDN+NTLTER+RRNPY+IVLLDEIEKADPQ+ITLLLQVL 
Sbjct: 483 EYNDRTAVSKLIGTTAGYVGYDDNNNTLTERVRR1IPYAIVLLDEIEKADPQIITLLLQVL 542 

Query: 545 DDGRLTDGQGNTINFKNTVIIATSNAGFGNEAFTGDSDKDLKIMERISPYFRPEFLNRFN 604 

DDGRLTDGQGNTINFKNTVI IATSNAGFG + + IM+RI+PYFRPEFLNRFN 

Sbjct: 543 DDGRLTDGQGNTINFKNTVI IATSNAGFGQQ DTETSESNIMDRIAPYFRPEFIiNRFN 599 

Query: 605 GVIEFSHLSKDDLSEIVDLMLDEVNQTIGPCKGIDLWDENVKSHLIELGYDEAMGVRPLR 664 

+I+F+HL K+ L EIVDLML EVNQT KKGI Ii + ++ K+HLI+LGY+ AMG RPLR 
Sbjct: 600 S 1 1 KFNHLQKESLEE I VDLMLAE VNQTTAKKGI S LT ITDDAKAHLI DLGYNHAMGARPLR 659 



Query: 665 RVIEQEIRDRITDYYLDHTDVKHLKANLQDGQIVISE 701 
65 R+IEQEIRDRITDYYLDH +VK L+A L++GQ+VI + 
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Sbjct: 660 RIIEQEIRDRITDYYLDHPEVKKLQAILKEGQLVIRQ 696 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1911 

A DNA sequence (GBSx2020) was identified in S.agalactiae <SEQ ID 5925> which encodes the amino 
acid sequence <SEQ ID 5926>. Analysis of this protein sequence reveals the following: 

lcleavable N-term signal seq 
INTEGRAL Likelihood = -4.78 Transmembrane 8 - 24 ( 7 - 25) 

Final Results 

bacterial membrane --- Certainty=0 .2911 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9945> which encodes amino acid sequence <SEQ ID 9946> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC73354 GB:AE000134 putative enzyme [Escherichia coli K12] 
Identities = 142/307 (46%), Positives = 195/307 (63%), Gaps = 6/307 (1%) 

Query: 39 KELLESKKTLILHGALGTELESRGCDVSGKLWSAKyLIEDPAAIQTIHEDYIRAGADIVT 98 

+ LL+ + L+L GA+ TELE+RGC+++ LWSAK L+E+P 1+ +H DY RAGA 
Sbjct: 8 RALLDKQDILLLDGAMATELEARGCNIJffiSLWSAKVLVENPELIREVHLDYYRAGAQCAI 67 

Query: 99 TSTYQATLQGLAQVGVSESQTEDLIRLTVQLAKAAREQVWKSLTKEEKSERIYPLISGDV 158 

T++YQAT G A G+ E+Q++ LI +V+LA+ ARE L 4 ++ + L++G V 

Sbjct: 68 TASYQATPAGFAARGLDEAQSKALIGKSVELARKAREAY LAENPQAGTL- - LVAGSV 122 

Query: 159 GPYAAFLADGSEYTGLYDIDKQGLKNFHRHRIELLLDEGVDILALETIPNAQEAEALIEL 218 

GPY A+LADGSEY G Y +4- FHR R+E LLD G D+LA ET+PN E EAL EL 
Sbjct: 123 GPYGAYLADGSEYRGDYHCSVEAFQAFHRPRVEALLDAGADLLACETLPNFSEIEALAEL 182 

Query: 219 LAEDFPQVEAYMSFTSQDGKTISDGSAVADLAKAIDVSPQWALGINCSSPSLVADFLQA 278 

L +P+ A+ SFT +D + +SDG+ + D+ + PQWALGINC + LQ 
Sbjct: 183 LTA- YPRARAWFS FTLRDSEHLSDGTPLRDWALIiAGYPQWALG INC IALENTTAALQH 241 

Query: 279 IAEQTNKPLVTYPNSGEVYDGASQSWQSSPDHSHTLLENTSDWQKLGAQWGGCCRTRPA 338 

+ T PLV YPNSGE YD S++W +H L + WQ GA+++GGCCRT PA 
Sbjct: 242 LHGLTVLPLWYPNSGEHYDAVSKTWHHHGEHCAQLADYLPQWQAAGARLIGGCCRTTPA 301 

Query: 339 DIADLSA 345 

DIA L A 
Sbjct: 302 DIAALKA 308 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8933> and protein <SEQ ID 8934> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 5.48 
GvH: Signal Score (-7.5): -2.64 
Possible site: 20 
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>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -4.78 threshold: 0.0 

INTEGRAL Likelihood = -4.78 Transmembrane 8 - 24 ( 7 - 25) 
PERIPHERAL Likelihood = 2.49 259 
5 modified ALOM score: 1.46 

*** Reasoning Step: 3 

Pinal Results 

10 bacterial membrane — Certainty=0 .2911 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ? 

bacterial cytoplasm --- Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

15 0RF01312(412 - 1338 of 1644) 

QMNl|NT01EC0303(55 - 357 of 358) conserved hypothetical protein 
%Match =23.8 

%Identity =46.6 %Sitnilarity =64.3 

Matches = 142 Mismatches = 107 Conservative Sub.s = 54 

20 

288 318 34S 378 408 438 468 498 

LISQSFCS*FRL*GLLGIAHNVLGFTSVFHLLFSAIFITNyVTRNGDLM6RFKELLESKKTLILHGALGTELESRGCDVS 

■■: 11= . 1 = 1 ||: I I I I : I I I = : = 

ATOPVLGVMSIQRRELRCGAGYRLLRCAMVLISLLNPETQ^SQNMSQ 
25 20 30 40 50 60 70 80 

528 558 588 618 648 67B 708 738 

GRLWSAICfLIEDP/AMQTIHEDYIR&GADIVTTSTYQATL^^ 

urn i = i = i i= =i n mi mini m i= nm n mm hi i 

30 DSLWSAK^WNPELIREWLDYYRAGAQCAITASYQATPAGFPJUiGLDFAQSKALIGKSTOI/^KTARE AYLAEN 

100 110 120 130 140 150 

768 798 828 858 888 918 948 978 

SERIYPLISGDVGPYAAFIMGSEYTGLYDIDKQGLOTFimHRIELLLDEGVDIl^ETIPmQEAFJVLIELLAEDFPQV 

35 : mi mi miimi i i = == in m in 1 mi mn i m m m 

PQAGTLLVAGSVGPYGAYI^GSEYRGDYHCS^^FOAFHRPR\mALLDAGADLI^CETLPNFSEIEAIAELLT-AYPRA 
170 180 190 200 210 220 230 

1008 1038 1068 1098 1128 1158 1188 1218 

40 FAYMSFTSQDGKTISDGSAVADLAKAIDVSPQWALGINCSSPSLVADFLQAIAEQTNKPLWYPNSGEVYDGASQSWQS 
|: HI :| = :|||= : h = llllllllll ' H = I HI Ullll II 1-1 = 

RAWFSFTLRDSEHLSDGTPLFJ3WALIAGYPQWALGINCIALEOT 

250 260 270 280 290 300 310 

45 1248 1278 1308 1338 1368 1398 1428 1458 

SPDHSHTLLENTSDWQKLGAQWGGCCRTRPADIADLSAKLK*VKYLEEG*GKFDFLFQSTRKPAWILPNGFCFYLSEMT 

= i m n mmnni mn i i 

HGEHCAQLADYLPQWQAAGARLIGGCCRTTPADIAALKARS 
330 340 350 

50 SEQ ID 8934 (GBS381) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 68 (lane 6; MW 42kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 72 (lane 4; MW 66.9kDa). 

Example 1912 

A DNA sequence (GBSx2021) was identified in S.agalactiae <SEQ ID 5927> which encodes the amino 
55 acid sequence <SEQ ID 5928>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>» Seems to have no N-terminal signal sequence 

Final Results 

60 bacterial cytoplasm --- Certainty=0. 2995 (Affirmative) < suco 
bacterial membrane Certainty=0 .0000 (Not Clear) < suco 
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bacterial outside -— Certainty=0 . 0000 (Not Clear) < succ? 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1913 

A DNA sequence (GBSx2022) was identified in S.agalacliae <SEQ ID 5929> which encodes the amino 
acid sequence <SEQ ID 593 0>. Analysis of this protein sequence reveals the following: 



> N- terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



= -2. 



Transmembrane 176 - 

57 Transmembrane 89 - 

03 Transmembrane 337 - 

87 Transmembrane 292 - 
51 Transmembrane 58 - 

88 Transmembrane 267 - 



125 - 141 



13 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 212 - 



• Certainty=0. 5649 (Affirmative) ■ 

• Certainty=0. 0000 (Not Clear) <: i 
■ Certainty=0. 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 9377> which encodes amino acid sequence <SEQ ID 9378> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12034 GB.-Z99105 similar to histidine permease [Bacillus subtilis3 
Identities = 221/384 (57%), Positives = 291/384 (75%), Gaps - 2/384 (0%) 

Query: 2 PVTGSFHTYATKFISPGTGFTVAWLyWICTTVALGTEFLGAAMLMQRWFPNVPAWAFASF 61 

PVTG+FHTYA K+I PGTGFTVAWLYW+ WTVALG+EF A +LMQRWFP+ W +++ 
Sbjct: 76 PVTGAFHTYAAKYIGPGTGFWAV&YWLTVmTAIiGSEFTAAGLLMQRWFPHTSVWMWSAV 135 

Query: 62 FALVIFGLNALSVRFFAEAESFFSSIKVIAIIIFIILGLGAMFGLVSFEGQHKAILFTHL 121 

FAL IF LNA SV+FFAE+E +FSSIKV+AI++FI4-LG AMFG++ +G A + ++ 
Sbjct: 136 FALFIFLIiNAFSWFFAESEFWFSSIKVLAIVLFILLGGSAMFGIIPIKGGEAAPMLSNF 195 

Query: 122 TANGA- FPNGI VAVVSVMLKVNYAF SGTEL IG I AAGE TDNPKBAVPRAIKTTIGRLWFF 180 

TA G FPNG V ++ ML+VN+AFSGTELIGIAAGE+ +P + +P+AIKTT+ RL +FF 
Sbjct: 196 TAEGGLFPNGFVPILMTMLSTOFAFSGTELIGIAAGESVDPDKTIPKAIKTTVWRLSLFF 255 

Query: 181 VLTIWLASLLPMKEAGVSTAPFVDVFDKMGIPFTA0;MNFVILTAILSAGNSGLYASSR 240 

V TI VL+ L+P+++AGV +PFV VFD++G+P+ ADIMNFVILTAILSA NSGLYASSR 
Sbjct: 256 VGTIFVLSGLIPIQDAGVIKSPFVAVFDRVGVPYAADIM5FVILTAILSAANSGLYASSR 315 

Query: 241 MLWSLANFX3MLSKSWKINKHGVPMRALLLS[^GAVLSLFSSIYAADTVYLALVSIAGFA 300 

MLWSL+ E L + K+ G P AL+ SM G +LSL SS++A DTVY+ LVSI+GFA 
Sbjct: 316 MLWSLSKEKTLHPTFAKLTSKGTPFNALVFSMIGGILSLLSSVFAPDTVYVVLVSISGFA 3 75 

Query: 301 VVWVWLAlPVAQINFRKEFLKE-NQLEDLSiKTPFTPVLPYITIILLLISIVGIAVIDSSQ 359 

VWVW+ 1 +Q FRK +++ N++ DL Y+TP P +P +L L S+VGIA+D +Q 
Sbjct: 376 VVVVWMGIAASQFMFRKRYIEAGNKVTDLK^TPLYPFVPIAAFLLCIASWGIAFDPNQ 435 

Query: 360 RAGLYFGVPFIIFCYIYHKLRYKK 383 

R LY GVPF+ CY + ++ +K 
Sbjct: 436 RIALYCGVPFMAICYAIYYVENRK 459 
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There is also homology to SEQ ID 4070. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1914 

A DNA sequence (GBSx2023) was identified in S.agalactiae <SEQ ID 5931 > which encodes the amino 
acid sequence <SEQ ID 5932>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 237S (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

There is also homology to SEQ ID 5642. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1915 

A DNA sequence (GBSx2024) was identified in S.agalactiae <SEQ ID 5933> which encodes the amino 
acid sequence <SEQ ID 5934>. Analysis of this protein sequence reveals the following: 
Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4935 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1916 

A DNA sequence (GBSx2025) was identified in S.agalactiae <SEQ ID 5935> which encodes the amino 
acid sequence <SEQ ID 5936>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0530 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1917 

A DNA sequence (GBSx2026) was identified in S.agalactiae <SEQ ID 5937> which encodes the amino 
acid sequence <SEQ ID 5938>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0175 (Affirmative) < suco 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF63739 GB-.AF236863 hypothetical GTP-binding protein 
[Lactococcus lactis] 
Identities = 142/193 (73%) , Positives = 165/193 (84%) 

Query: 6 LOTHNASILLSAANKSHYPQDDLPEV7ALAGRSNVGKSSFINTLLGRKNIARTSSKPGKTQ 65 

+NT+N +1 +SAA+K YP++D PE+ALAGP.SNVGKSSFINTLL RKN ARTS +PGKTQ 
Sbjct: 3 I^^IMNLTITISAASKKQYPEIroWPEIAIAGRSNVGKSSFI^^^LIlNR^<NFARTSGQPGKTQ 52 

Query: 66, LLNFYNIDDK1RFVDVPGYGYAKVSKTERAKWGKMIEEYLVTRDNLRVVVSLVDFRHDPS 125 

LLNFYNIDD-i L FVDVPGYGYA+VSK ER KWGKMIEEYL TR+NL+ WSLVD RH+PS 
Sbjct: 63 LLNFYNIDDQLHFVDVPGYGYARVSKKEREKWGKMIEEYLTTRENLKAWSLVDIRHEPS 122 

Query: 126 ADDIQMYEFLKyYEIPVIIVATKADKIPRGlOTKHESSIKKKLNFDKKDHFIVFSSVDRT 185 

DD+ MYEFLKYY I PVI +VATKADK+PRGKWNKHES IKK + FD D FI+FSS D+T 
Sbjct: 123 EDDIM^FLKYYHIPVILVATKADKVPRGKHNKHESIIKKAMKFDSTDDFIIFSSTDKT 182 

Query: 186 GLDESWDTILSEL 198 

G4+E+W IL L 
Sbjct: 183 GIEEAWTAILKYL 195 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5939> which encodes the amino acid 
sequence <SEQ ID 5940>. Analysis of this protein sequence reveals the following: 



Possible site: 18 

»> Seems to have no N-terminal signal 



Final Results 

bacterial cytoplasm --- Certainty=0 . 0123 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 167/196 (85%) , Positives = 183/196 (93%) 

Query: 3 EEFOTHNASILLSAANKSHYPQDDLPEVAmGRSWGKSSFINTLLGRKNLARTSSKPG 62 

E+ LNIHNASILLSAANKSHYPQDDLPE+ALAGRSOTGKSSFINT+LGRKNLARTSSKPG 
Sbjct: 4 EQVIjNIHffiSILLSAANKSHYPQDDLPEIAIAGRSWGKSSFINTILGRKNLARTSSKPG 63 

Query: 63 KTQLtNFYNIDDKLRFVIlVPGYGYAK^SKTEPAiaraKMIEEYLTORDNLRVWSLVDFRH 122 

KTQLLNF+NIDDKLRFTOVPGYGYAKVSK+ERAKWGKMIEEYL +RDNLR WSLVD RH 
Sbjct: 64 KTQLIiNFFNIDDKLRFVUVPGYGYAKVSKSEPAKWGKMIEEYLTSRDNLRAVVSLVDLRH 123 

Query: 123 DPSADDIQMYEFLKYYEIPVIIVATKADKIPRGKWNKHESSIKKK1NFDKKDHFIVFSSV 182 

PS +DIQMY+FLKYY+IPVI+VATKSDKIPRGKWNKHES +KK LNFDK D FIVFSSV 
Sbjct: 124 APSKEDIQMYDFLKYYDIPVIWATKADK1PRGKJTOEOIESWKKALNFDKSDTFIVFSSV 183 
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Query: 183 DRTGLDESWDTILSEL 198 

+R G+D+SWD IL ++ 
Sbjct: 184 ERIGIDDSWDAILEQV 199 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1918 

A DNA sequence (GBSx2027) was identified in S.agalactiae <SEQ ID 5941> which encodes the amino 
acid sequence <SEQ ID 5942>. This protein is predicted to be protease ClpX (clpX). Analysis of this 
10 protein sequence reveals the following: 

Possible site: 50 

>>> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 23 89 (Affirmative) < suco , 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9947> which encodes amino acid sequence <SEQ ID 9948> 
20 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF63738 GB:AF236863 protease ClpX [Lactococcus lactis] 
Identities = 305/395 (77%), Positives = 357/395 (90%), Gaps = 1/395 (0%) 

25 Query: 18 NWCSFCGKSQDEVKKIIAGNGVPICNECVALSQEIIKEFJIAEEVLADIjAEVPKPKELLE 77 

N+ CSFCGKSQD+VKK+1AG+ V+ICNEC+ LS I++EEL EE +++ EV PKE+ + 
Sbjct: 8 NIQCSFCGKSQDDVKKMIAGSDVYICNECIELSTRILEEELKEEQDSEMLEVKTPKEMFD 67 

Query: 78 ILNQYWGQDRAKRAIAVAVYNHYKRVSYTESS-DDDVDLQKSNILMIGPTGSGKTFLAQ 13S 
30 LN+YV+GQ++AKRALAVAVYNHYKR+++T S +D+ +LQKSNI L+ IGPTGSGKTFLAQ 

; Sbjct: 68 HLNEyVIGQEKAKRAIAVAVYNHYKRINFTASKIAEDIELQKSNILLIGPTGSGKTFLAQ 127 

Query: 137 TLAKSLOTPFAIADATSLTFAGYVGEDVENILLKLIQAADYNVERAERGIIYVDEIDKIA 195 
TLAKSLNVPFAIADATSLTEAGYVGEDVENILLKL+QA+D+N+ERAERGIIY+DEIDKIA 
35 Sbjct: 128 TLAKSIiNVPFAIADATSLTEAGYVGEDVENILLKLLQASDFNIERAERGIIYIDEIDKIA 187 

Query: 197 KKGENVSITRDVSGEGVQQALLKIIEGTVASVPPQGGRKHPNQEMIQINTKNILFIVGGA 255 

KK ENVSITRDVSGEGVQQALLKIIEGTVASVPPQGGRKHPNQEMIQI+TKNILFIVGGA 
Sbjct: 188 KKSENVS ITRDVSGEGVQQALLKI IEGTVASVPPQGGRKHPNQEMIQIDTKNILFIVGGA 247 

40 

Query: 257 FDGIEDLVKQRLGEKVIGFGQTSRKIDDNASYMQEIISEDIQKFGLIPEFIGRLPWAAL 316 

FDGIE++VKQRLGEK+IGFG ++K+ D SYMQEI I+EDIQKFGLIPEFIGRLP+VAAL 
Sbjct: 248 FDGIEEIVKQRLGEKIIGFGANNKKLSDEDSYMQEIIAEDIQKFGLIPEFIGRLPIVAAL 307 

45 Query: 317 ELLTAEDLVRILTEPRNALVKQYQTLLSYDGVELEFDQDALLAIADKAIERKTGARGLRS 376 

E LT EDL++ILTEP+NAL+KQY+ LL +D VELEF AL+AIA KAIERKTGARGLRS 
Sbjct: 308 ERLTEEDLIQILTEPKNALIKQYKQLLLFDNVEIjEFKDGALMAIAKKAIERKTGARGLRS 367 

Query: 377 IIEETMLDIMFEIPSQEDVTKVRITKAAVEGTDKP 411 
50 IIEE M+DIMFE+PS E++TKV IT+A V+G +P 

Sbjct: 368 I IEE VMMDIMFEVPSHEEITKVI ITEAWDGKAEP 402 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5943> which encodes the amino acid 
sequence <SEQ ID 5944>. Analysis of this protein sequence reveals the following: 

55 Possible site: 42 

•' >» Seems to have no N-terminal signal sequence 



Final Results 
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bacterial' cytoplasm Certainty=0 .2711 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 378/409 (92%), Positives - 393/409 (95%), Gaps = 1/409 (0%) 

Query: 9 MAGNRNNDMNVYCSFCGKSQDEVKKI IAGNGVFICNECVALSQEI IKEELAEEVLADLAE 68 

MAG+R ND+ VYCS FCGKSQD+VKKI IAGN VFICNECVALSQEI IKEELAEEVEiADIi E 
Sbjct: 1 MAGSRTNDI KVYCS FCGKSQDDVKKI IAGNNVFICNECVALSQE 1 1 KEELAEE VLADLTE 60 

Query: 69 VPKPKELLEILNQYWGQDRAKRALAVAVYNHyKRVSYTES-SDDDVDLQKSNILMIGPT 127 

VPKPKELL++LNQYWGQDRAKRAL+VAVYNHYKRVS+TES DDDVDLQKSNILMIGPT 
Sbjct: 61 VPKPKELLDVINQYWGQDRAKRALSVAvYNHYKRVSFTESi»DDDVDLQKSNILMIGPT 120 

Query: 128 GSGKTFIAQTIAKSLNVPFAIADATSLTEAGYVGEDVENILLKLIQAADYNVERAERGII 187 

GSGKTFIAQTIAKSLWPFAIADATSLTEAGYVGEDVENILLIOliIQAADYlJvERAERGII 
Sbjct: 121 GSGKTFLAQTLAKSLNVPFAIADATSLTEAGYVGEDVENILLKLIQAADYNVERAERGII 180 

Query: 188 YVDE IDKI AKKGENVS I TRD V SGEGVQQALLKI I EGTVAS VPPQGGRKHPNQEMI QINTK 247 

YvDEIDKIAKKGENVSITRDVSGEGVQQAL&KIIEGTVASVPPQGGRKHPNQEMIQI+TK 
Sbjct: 181 YVDEIDKIAKKGENVSITRDVSGEGVQQALLKIIEGTVASVPPQGGRKHPNQEMIQIDTK 240 

Query: 248 NILFIVGGAFDGIEDLVKQRLGEKVIGFGQTSRKIDDNASYMQEIISEDIQKFGLIPEFI 307 

NILFIVGGAFDGIE++VKQRLGEKVIGFGQ SRKIDDNASYMQEIISEDIQKFGLIPEFI 
Sbjct: 241 NILFIVGGAFDGIEEIVKQRLGEKVIGFGQNSRKIDDNASYMQEIISEDIQKFGLIPEFI 300 

Query: 308 GRLPWAALELLTAEDLVRILTEPRNALVKQYQTIiLSYDGvELEFDQDALLAIADKAIER 367 

GRLPWAALE L DL++ILTEPRNALVKQYQ LLSYDGVEL FD++AL AIA+KAIER 
Sbjct: 3 01 GRLPWAALEQIJSrrSDLIQILTEPRNALWQYQALLSYDGVEIiAFDKEALEAIANKAIER 360 

Query: 368 KTGRRGLRSIIEETMLDIMFEIPSQ3D\TKVRITKAA''/EGTDKPVLETA 416 

KTGARGLRSIIEETMLDIMFEIPSQEDVT'KVRITKAAVEG KPVLETA 
Sbjct: 361 KTGARGLRSIIEETMLDIMFEIPSQEDVTKVRITKAAVEGKSKPVLETA 409 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1919 

A DNA sequence (GBSx2028) was identified in S.agalactiae <SEQ ID 5945> which encodes the amino 
acid sequence <SEQ ID 5946>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1920 

A DNA sequence (GBSx2029) was identified in S.agalactiae <SEQ ID 5947> which encodes the amino 
acid sequence <SEQ ID 5948>. Analysis of this protein sequence reveals the following: 



WO 02/34771 



PCT/GB01/04789 



-2162- 

d N-terminal signal sequence 

-' Final Results 

bacterial cytoplasm Certainty=0.4029(Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9949> which encodes amino acid sequence <SEQ ID 9950> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC33872 GB:AF055727 dihydrof olate reductase [Streptococcus pneumoniae] 
Identities = 83/162 (51%) , Positives = 118/162 (72%) , Gaps = 1/162 (0%) 

Query: 25 MTKQIIAIWAEDEDHLIGWGGLPWRLPKELHHFKETTMGQALLMGRKTFDGMNRRVLPG 84 

MTK+I+AIWA+DE+ LIG LPW LP EL HFKETT+ A+LMGR TFDGM RR+LP 
Sbjct: 1 MTKKIVAIWAQDEEGLIGKENRLPWHLPAELQHFKETTLNHAILMGRVTFDGMGRRLLPK 60 

Query: 85 RETIILTKDEQFQADGVTVLNSVEQVIIO'JFQEHNKTLFIVGGASIYKAFLPYCEAIIKrK 144 

RET+ILT++ + + DGV V+ V+ W+Q+ K L+I+GG I++AF PY + +1 T 

Sbjct: 61 RETLILTRNPEEKIDGVATFQDVQSVLDTOQDQEKNLYIIGGKQIFQAFEPYLDEVIVTH 120 

Query: 145 VHGKFKGDTYFP -DVNLSEFKVI SRDYFEKDEQNAHAFTVTY 185 

+H + 4GDTYFP +++LS F+ +S ++ KDE+N + FT+ Y 
Sbjct: 121 IHARVEGDTYFPEELDLSLFETVSSKFYAKDEKNPYDFTIQY 162 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5949> which encodes the amino acid 
sequence <SEQ ID 5950>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0 . 1214 (Affirmative) < 

bacterial membrane Certainty=0 . 0000 (Not Clear) < £ 

bacterial outside Certainty=0 . 0000 (Not Clear) < £ 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 82/160 (51%) , Positives = 119/160 (74%), 

Query: 25 NTKQIIAIWAEDEDHLIGVNGGLPWRLPKELHHFKETTMGQALLMGRKTFDGMNRRVLPG 84 

MTK+IIAIWAEDE LIG+ G LPW LPKEL HFK+TT+ QA+LMGR TF+GMN + LP 
Sbjct: 1 MTKEIIAIWAEDEAGLIGIAGKLPWYLPKELEHFKKTTLHQAILMGRVTFEGMNCKRLPQ 60 

Query: 85 RETIILTKDEQFQADGVTVLNSVEQVIKl'JFQEHNKTLFIVGGASIYKAFLPYCEAIIKTK 144 

R+T+++T++ +Q D V + S+E+V++W+ +KTL+I+GG + +AF Y + IIKT 
Sbjct: 61 RCTLVMTRNRDYQVDEVLTMTSIEKVLEWYHAQDKTLYIIGGNKVLEAFNGYFDRIIKTV 120 

Query: 145 VHGKFKGDTYFPDVNLSEFKVISRDYFEKDEQNAHAFTVT 184 

+H +FKGDTY P+++ S F S+ ++ +D +N + FTVT 
Sbjct: 121 IHHRFKGDTYRPNLDFSHFTQESQTFYARDAKNPYDFTVT 160 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1921 

A DNA sequence (GBSx2030) was identified in S.agalactiae <SEQ ID 5951> which encodes the amino 
acid sequence <SEQ ID 5952>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
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i» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 1577 (Affirmative) < succ: 
bacterial membrane --- Certainty=O.OO00 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA25221 GB:M33770 thymidylate synthase (EC 2. 1.1. 45) 
[Lactococcus lactis] 
Identities = 215/280 (76%), Positives = 245/280 (86%), Gaps = 2/280 (0%) 

Sbjct: 1 MTYADQVFKQNIQNILDNGVFSENARPKYKDGQI'lANSKYVTGSFvTYDLQKGEFPITTLR 60 

Query: 61 PIPIKSAIKEIFWIYQDQTNDIAVLIClKYGvTYWNDlffiVGHTC 120 

PIPIKSAIKE+ WIYQDQT++L+VL +KYGV YW +W +G GTIGQRYGA VKK+NII 
Sbjct: 61 PIPIKSAIKELWIYQDQTSELSVLEEKYGVKY1TOHWGIGD-GTIGQRYGATVKKYNIIG 119 

Query: 121 KLLKQLEDNPWNRKWISLWDYEAFEETEGIiliPCAFQTMFDVRRV-NGELYLDATLTQRS 179 

KLL+ L NPWNRRN+I+LW YE FEETEGLLPCAFQTMFDVRR +G++YLDATL QRS 
Sbjct: 120 KLLEGIAKNPWNRRNIINLWQYEDFEETEGLLPCAFQTMFDVRREKDGQIYLDATLIQRS 179 

Query: 180 KEMLVAHHINAMQYVALQMMIAKHFGWRVGKFFYFINNLHIYDNQFEQAQELLKRQPSEC 239 

NDMLVAHHINAMQYVALQMMIAKHF W+VGKFFYF+NNLHIYDNQFEQA EL+KR SE 
Sbjct: 180 ^MLVAHHINAMQYVALQMMIAKHFSWKVGKFFYFVNNLHIYDNQFEQANELMKRTASEK 239 

Query: 240 NPKLVLNVPDGTDFFDIKPDDFALVDYDPIKPQLRPDLAI 279 

P+LVLNVPDGT+FFDIKP+DF LVDY+P+KPQL+FDLAI 
Sbjct: 240 EPRLVTJNTVPDGTNFFDIKPEDFELVDYEPVKPQLKFDLAI 279 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5953> which encodes the amino acid 
sequence <SEQ ID 5954>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3131 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/279 (81%) , Positives = 251/279 (89%) 

MTKADLLFKDNITKIMSEGVFSEQARPRYKNGEMANSKYITGAFAEYDLSKGEFPITTLR 60 
MTKAD +FK NI KI++EG SEQARP+YK+G A+SKYITGAFAEYDL+KGEFPITTLR 
MTKADQIFKANIQKI INEGSLSEQARPKYKDGRTAHSKYITGAFAEYDLAKGEFPITTLR 6 8 



K+LKQL +NPWNRRNVISLWDYEAFEET+GLLPCAFQ MFDVRRV +LYLDA+LTQRSN 





1 


Sb j ct : 


9 


Query: 




Sbjct: 


69 


Query: 


121 


Sbjct: 


129 


Query: 


181 


Sbjct: 






241 


Sbjct: 


249 



PKLVLNVPD T+FFDIKPDDF L +YDP+KPQL FDLAI 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1922 

A DNA sequence (GBSx2031) was identified in S.agalactiae <SEQ ID 5955> which encodes the amino 
acid sequence <SEQ ID 5956>. This protein is predicted to be HMG-CoA synthase. Analysis of this protein 
sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0816 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5957> which encodes the amino acid 
sequence <SEQ ID 5958>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1670 (Affirmative) « suco 

bacterial membrane Certainty=0.0000(Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 260/385 (67%) , Positives = 325/385 (83%) 

Query: 36 MKIGIDKIGFATSQYVLEMTDLAIARQTOPEKFSKGLLLDSLSITPVTEDIVTLAASAAN 95 

M IGIDKIGFATSQYVL++ DIA+ARQVDP KFS+GLL++S S+ P+TEDI+TIAASAA+ 
Sbjct: 14 MTIGIDKIGFATSQYVLKLEDLALARQVDPAKFSQGLLIESFSVAPITEDIITLAASAAD 73 

Query: 96 D I LSDEDKETI DMVT VATESS I DQSKAASVYVHQLLEIQPFARSFEMKEACYSATAALDY 155 

IL+DED+ IDMVI+ATESS DQSKA+++YVH L+ I QPFARS FE+K+ACYSATAALDY 
Sbjct: 74 QILTDEDPAKIDMVILATESSTDQSKASAIYVHHLVGI QPFARS FEVKQACYSATAALDY 133 

Query: 156 AKLHVEKHPDSKVLVIASDIAKYGIKSTGESTQGAGSIAMLISQNPSILELKEDHLAQTR 215 

AKLHV PDS+VLVIASDIA+YG+ S GESTQG+GSIA+L++ NP IL L ED++AQTR 
Sbjct: 134 AKLHVASKPDSRVIVIASDIARYGVGSPGESTCGSGSIALLVTANPRILAIJIEDNVAQTR 193 

Query: 216 DIMDFWRPNYSDVPYVNGMFSTKQYLDMLKTTWKVYQKRFNTSLSDYAAFCFHIPFPKLA 275 

DIMDFWRPNYS PYV+G++STKQYL+ L+TTW+ YQKR N LSD AA CFHIPFPKLA 
Sbjct: 194 DIITOFWRPNYSFTPYVDGIYSTKQYmCLETTWQAYQKREI&QLSDIAAVCFHIPFPKLA 253 

Query: 276 LKGFNKILDNNLDEQKKAELQENFEHSITYSKKIGNCYTGSLYLGLLSLLENSQNLKAGD 335 

LKG N I+DN + + + +L E F+ SI+YSK+IGN YTGSLYLGLLSLLENS+ L++GD 
Sbjct: 254 LKGLNNIMDNTVPPEHREKLIEAFQASISYSKQIGNIYTGSLYLGLLSLLENSKVLQSGD 313 

Query: 336 QlAFFSYGSGAVAEIFTGQLvUGYQNKLQSDRMDQmKRQKITVTEYEKLFFEKTILDEN 395 

+1 FFSYGSGAV+E ++GQLV GY L ++R L++R +++V++YE LF+E+ LD+N 
Sbjct: 314 KIGFFSYGSGAVSEFYSGQLVAGYDKMMTl^QALLDQRTRLSVSKYEDLFYEQVQLDDN 373 

Query: 396 GNANFNTYRTGTFSLDSICEHQRIY 420 

GNANF+ Y TG F+L +1 EH+RIY 
Sbjct: 374 GNANFDIYLTGKFALTAIKEHRRIY 398 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



Example 1923 

A DNA -sequence (GBSx2032) was identified in S.agalactiae <SEQ ID 5959> which encodes the amino 
acid sequence <SEQ ID 5960>. This protein is predicted to be HMG-CoA reductase (mvaA). Analysis of 
this protein sequence reveals the following: 



Possible site: 50 

»> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -1.49 Transmembrane 
INTEGRAL Likelihood = -1.33 Transmembrane 



10 Pinal Results 

bacterial membrane Certainty=0 . 1595 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15' The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG02454 GB:AF290098 HMG-CoA reductase [Streptococcus pneumoniae] 
Identities = 266/421 (63%) , Positives = 343/421 (81%) , Gaps = 3/421 (0%) 

Query: 3 KISWTGFSKKSPEERIHYLEEQDFLADSSLEIVTNQDLLSLSLANQMAENVIGRIALPPS 62 
20 KISW GFSKKS +ER+ L+ Q L+ + + +S+++A+Q++ENV+G +LP+S 

Sbjct: 2 KISWNGFSKKSYQERLELLKAQALLSPERQASLEKDEQMSVTVADQLSENWGTFSLPYS SI 

Query: S3 LVPDVLVNGKVyQVPYVTEEPSVVAAASFAAKIIKRSGGFLTTVHNRKMIGQVALYDVQD 122 
LVP+VLTOG+ Y VPYVTEEPSWAAAS+A+KI IKR+GGF VH R+MIGQVALY V + 
25 Sbjct: 52 LVPEVLVNGQGYTVPYVTEEPSWAAASYASKIIKRAGGFTAQVHQRQMIGQVALYQVAN 121 

Query: 123 SQHTKESILNQKQQLLEIANAAHPSIVKRGGGACDLTIEI KEDFLIVYLMVDTKEftM 179 

+ +E I +-+K +LLE+AN A+PS I VKRGGGA DL +E 4 DFL+VY+ VDT+EAM 
^ Sbjct: 122 PKLAQEKIAS KKAELLELANQAYPS I VKRGGGARDLHVEQIKGEPDFLWYIHVDTQEAM 181 

Query: 180 GAWWNTMI1EALSSPLEDISKGKSLMSILSNYATESLVTATCRVDLRFLSRQKEEAIKLA 239 

GANM+NTM+EAL LE++S+G+SLM ILSNYAT+SLVTA+CR+ R+LSRQK++ ++A 
Sbjct: 182 GANMLNTMLEALKPVLEELSQGQSLMGILSNYATDSLVTASCRIAFRYLSRQKDQGREIA 241 

35 Query: 24 0 QKMTMASQIAQVDPYRASTHNKGIFNGIDAIVLATGNDWRA1EAGAHTYAVKDGQYRGLS 299 

+K+ +ASQ AQ DPYRA+THNKGIFNGIDAI++ATGNDWRAIEAGAH +A +DG+Y+GLS 
Sbjct: 242 EKIALASQFAQADPYRAATHNKGIFNGIDAILIATGNDWRAIEAGAHAFASRDGRYQGLS 3 01 

Query: 300 RWSYKVDDNCLEGTLTLPMPVATKGGSIGINPSVHLAHDLLGRPNAKELASIILSIGLAQ 359 
40 W+ ++ L G +TLPMPVATKGGSIG+NP V L+HDLLG P+A+ELA II+SIGLAQ 

Sbjct: 302 CWTLDLEREELVGEMTLPMPVATKGGSIGLNPRVALSHDLLGNPSARELAQIIVSIGLAQ 361 

Query: 360 NFAALKALVSTGIQAGHMKLQAKSLALIAGAKEEQISEWKQLLDSKHM 42 D 

NFAALKALVSTGIQ GHMKLQAKSLALLAGA E +++ +V++L+ K NLETAQ+ + L 
45 Sbjct: 362 NFAALKALVSTGIQQGHMKLQAKSLALLAGASESEVAPLVERLISDKTFNLETAQRYLENL 422 

A related DNA sequence was identified in S. pyogenes <SEQ ID 5961> which encodes t 
sequence <SEQ ID 5962>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3929 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=o . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 257/422 (60%) , Positives = 330/422 (77%) 



LL + ANQM ENV+GR+ALPF 
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Sbjct: 4 !TI]^SGFSKKTFEERLQLIEKPKLimEiraQLKTDVI.LPIQTMIQMTE!WLGRIiALPF 63 

Query: 62 SLVPDVLWGKVYQVPYVTEEPSWaAMFAMIIKRSGGFLTTVHNRKMIGQVALYDVQ 121 
S+ PD LVNG YQ+P+VTEEPSWAAASFAAK+IKRSGGF NR+MIGQ+ LYD+ 

5 Sbjct: 64 SIAPDFLWGSTyQMPFVTEEPSWaAASFAMlIKRSGGFKAQTLNRQMIGQIVLYDID 123 

Query: 122 DSQHTKES I ENQKQQLLE I ANAAHPSIVKRGGGACDLTI E I KEDFLI VYLMVDTKEAMGA 181 

+ K +IL++ ++L+ +AN A+PSIVKRGGGA + +E K +FLI YL VDT+EAMGA 
Sbjct: 124 QIDNAKAAILHKTKKLIALANKAYPSIVKRGG3ARTIHLEEXGEFLIFYLTVDTQEAMGA 183 

10 

Query: 182 IMVOTMMEALSSPLEDISKGKSLKSILSNYATESLVTATCRTOLRFLSRQKEEAIKLAQK 241 

NMVNTMMEAL L +SKG LM+ILSNYATESLVT +C + +R L K ++++LAQK 
Sbjct: 184 MMVNTMMEALVPDLTRLSKGHCLMAILSl^ATESLVTTSCEIPVRLLDHDKTKSLQLAQK 243 

15 Query: 242 MTmSQIAQVDPYRASTHNKGIFNGIDAIVIATGbroWRAIEAGAHTYAVKDGQYRGLSRW 301 

+ +AS+LAQVDPYRA+THNKGIFNGIDA+V+ATGKDI'IRAIEAGAH YA ++G Y+GLS+W 
Sbjct: 244 IELASRLAQVDPYRATTHNKGIFNGIDAWIATGNDWRAIEAGAHAYASRNGSYQGLSQW 303 

Query: 302 SYKVDDNCLEGTLTLPMPVATKGGSIGINP8VHLAHDLLGRPNAKELASIILSIGLAQNF 361 
20 + D L G +TLPMP+A+KGGSIG+NP+V +AHDLL +P+AK LA +1 S+GLAQNF 

Sbjct: 304 HFDQDKQVLLGQMTLPMPIASKGGSIGLNPTVSIAHDLLNQPDAKTLAQLIASVGLAQNF 363 

Query: 362 AALKALVSTGIQAGHMKLQAKSIALIAGAKEEQISEVWQLLDSKHMNLETAQKIVNECLT 421 
AALKAL S+GIQAGHMKL AKSLALLAGA +++I+ +V LL K +NLE A +++L 
25 Sbjct: 364 AALKALTSSGIQAGHMKLHAKSLALLAGATQDEIAPLVNALLADKPINLEKAHFYLSQLR 423 

Query: 422 KS 423 
+S 

Sbjct: 424 QS 425 

30 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1924 

A DNA sequence (GBSx2033) was identified in S.agalactiae <SEQ ID 5963> which encodes the amino 

35 acid sequence <SEQ ID 5964>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
. >>> Seems to have no N- terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 2355 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

45 A related DNA sequence was identified in S. pyogenes <SEQ ID 5965> which encodes the amino acid 
sequence <SEQ ID 5966>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terrainal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 2687 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below. 

Identities = 76/138 (55%) , Positives = 100/138 (72%) , Gaps = 2/138 (1%) 



Query: 7 PKWEELPELDIiYIiDQVLLYTOQLINPKTITITOKLLTASMINNYvKHNYISKPIKKKYNRR 66 
P W++LP+LDLYLDQVLLYVNQ + ++++K LTASMINNYVKH Y++KPIKKKY ++ 
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Sbjct: 7 PYWHJIiPDLDLYIjDQVLLYVNQCTDFSEVSDNKSLTASMINNWKHGYVTKPIKKKyQKQ 66 

Query: 67 QVAMIVITAPKQVFAIQEISQTLELLTADJIHSEEa«3GFAACMNKEE--VHDLPPWIS 124 

Q+ARLI 1+ FK VF IQ+IS+ LE L A SE BI F C Ht+ D+PP+V 
Sbjct: 67 QLARLIAISLFKTVFPIQDISRVLEELQAQADSESLYI3TFVTCWNQKAPIEEDIPPIVQV 126 

Query. 125 ACQTLNLYQETQKLVLEL 142 

ACQT+ Y +T L+ E+ 
Sbjct: 127 ACQTVKDYHKTIYLLQEV 144 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1925 

A DNA sequence (GBSx2034) was identified in S.agalactiae <SEQ ID 5967> which encodes the amino 
acid sequence <SEQ ID 5968>. This protein is predicted to be hemolysin iii. Analysis of this protein 
sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 
Likelihood = 



Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 



Transmembrane 142 - 158 ( 140 - 165! 

Transmembrane 26- 42 ( 19 - 

Transmembrane 200 - 216 ( 196 - 

Transmembrane 104 - 12 0 < 102 - 

Transmembrane 51 - 67 ( 49 - 

Transmembrane 172 - 188 { 169 - 



Final Results 

bacterial membrane --- Certainty=0. 4630 (Affirmative) < suco 

bacterial outside --- Cercainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm -— Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9951> which encodes amino acid sequence <SEQ ID 9952> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 17 EELANSITHA.VGALLMLILLPITAVYSHHHFGLaAALGTSIFVTSLFLMFLSSSiyHSMT 76 

EE+AU+ITH +GA+L + LI +++ H A + +++ S+FL++L S++ HS+ 
Sbjct: 14 EEIANAITHGIGAILSIPALIILIIHASKHGTASAVVAFTVYGVSMFLLYLFSTLLHSIH 73 

Query: 77 YNSLQKYVLRMIDHSMIYIAIAGSyTPVALSLlGGWLGYLIIFLQWGITLFGILYKIFAP 136 

+ ++ k + ++DHS IY+ IAG+YTP L + G LG+ ++ + W + + GI++KIF 
Sbjct: 74 HPKVEK-LFTILDHSAIYLLIAGTYTPFLLlTLRGPLGWTLLAIIWrLAIGGIIFKIFFV 132 

Query: 137 KIKDKFSLVLYLIMGWLVIF-IFPAIITKTGPAFWGLLLAGG1CYTIGALFYA-RKRPYD 194 

+ K S + Y+IMGWL+I IP TG F LLLAGGI Y++GA+F+ K P++ 

Sbjct: 133 RRFIKASTLCYI IMGWLI I VAI KPLYENLTGHGF - SLLLAGGILYSVGAI FFLWEKLPFN 191 

Query: 195 HMIWHLFILLASILQYIGIVYFML 218 

H IWHLF+L S + + +++++L 
Sbjct: 192 HAIWHLFVLGGSAMMFFCVLFYVL 215 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5969> which encodes me amino acid 
sequence <SEQ ID 5970>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
»> Seems to have no H- terminal signal sequence 

INTEGRAL Likelihood =-10. SI Transmembrane 144 - 160 ( 138 - 163) 
INTEGRAL Likelihood = -9.87 Transmembrane 49 - 65 ( 45 - 71) 
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Likelihood = -7.11 Transmembrane 198 - 214 ( 193 - 215) 

Likelihood =' -6.16 Transmembrane 102 - 118 ( IQO - 120) 

Likelihood = -2.97 Transmembrane 20 - 36 ( 20 - 41) 

Likelihood = -1.01 Transmembrane 167 - 183 ( 167 - 185) 

- CertaintyM) . 5203 (Affirmative) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA58877 GB:X84058 novel hemolytic factor [Bacillus cereus] 
Identities = 82/204 (40%), Positives = 128/204 (62%), Gaps = 4/204 (1%) 

Query; 15 EEVANSVTHAlGAFiMLILLPISaSYAYQTYDLPCAAIGISIWISLFLMFLSSTIYHSMR. 74 

EE+AN++TH 1GA + L I +A + A + +++ +S+FL+4-L ST+ HS+- 

Sbjct: 14 KEIANAITHGIGAILSIPALIILI IHASKHGTASAWAFTVYGVSMFLLYLFSTLLHSIH 73 

Query: 75 YGSVHKYILRIlDHSMIYIAIAGSYTPWiSLVSGWLGYIIIVLQf/GITLFGILYKlFAK 134 

+ V K + I+DHS 1Y+ IAG+YTP L + GLG++++W+ + GI++KIF 
Sbjct: 74 HPKVEK-LFTILDHSAIYLLIAGTYTPFLLITLRGPLGWTLLAIIVJTLAIGGIIFKIFFV 132 

Query: 135 RINEKFSLMLYIVMGWL-WFILPVIIQKTStiAFGLLMLFGGLSYTIGAVFYA-KKEPYF 192 

R K S + YI+MGWL +V I P+ T F LL L GG+ Y++GA+F+ +K P+ 
Sbjct: 133 RRFIKASTLCYIIMGWLIIVAIKPLYENLTGHGFSLL-LAGGILYSVGAIFFLWEKLPFN 191 

Query: 193 HMIWHLFILLASALQFIAITFFML 216 

H IWBLF+L SA+ F + F++L 
Sbjct: 192 HAIWHLFVLGGSAMMFFCVLFYVL 215 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 153/213 (71%) , Positives = 181/213 (84%) 

Query: 6 SIKLSPQLSFGEEIANSITHAVGALLMLILLP1TAVYSHNHFGLQAALGTSIFVTSLFLM 65 

+ K S LSF EE+ANS+THA+GA MLILLPI+A Y++ + L+AA+G SIFV SLFLM 
Sbjct: 4 TFKQSLPLSFSEEVANSTCIAIGAFAMLILLPISASYAYQTYDLKAAIGISIFVISLFLM 63 

Query: 66 FLSSSIYHSMTYNSLQKYVLRMIDHSMIYIAIAGSYTPVALSLIGGWLGYLIIFLQWGIT 125 

FLSS+IYHSM Y S+ KY+LR+IDHSMIYIAIAGSYTPVALSL-t GWLGY+II LQWGIT 
Sbjct: 64 FLSSTIYHSMAYGSVHKYILRI IDHSMIYIAIAGSYTPVALSLVSGWLGYI 1 1 VLQWGIT 123 

Query: 126 LFGILYKIFAPKIHDKFSLVLYIjIMGWLVIFIFPAIITKTGPAFWGLLLAGGICYTIGAL 185 

LFGILYKIFA + IN+KFSL+LY++MGWLV+FI P II KT AF L+L GG+ YTIGA+ 
Sbjct; 124 LFGILYKrFAKRINEKFSLMLYIVMGKLWF-L,PVIIQ!CrSLAFGLLMLFGGLSYTIGAV 183 

Query: 186 FYARKRPYDHMIWHLFILLASILQYIGIVYFML 218 

FYA+KRPY HMIWHLFILLAS LQ+I I +FML 
Sbjct: 184 FYAKKRPYFHMIWHLFILLASALQFIAITFFML 216 

Based on this analysis, it was predicted that these proteins and then epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1926 

A DNA sequence (GBSx2035) was identified in S.agalactiae <SEQ ID 5971> which encodes the amino 
acid sequence <SEQ ID 5972>. Analysis of this protein sequence reveals the following: 

:> N- terminal signal sequence 

• Final Results 

bacterial cytoplasm — Certainty=0. 3641 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0.0000(Not Clear) < euco 



WO 02/34771 



PCT/GB01/04789 



-2169- 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12492 GB:Z99107 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 81/302 (26%) , Positives = 157/302 (51%) , Gaps = 10/302 (3%) 

5 ■. ■ \ * 



Query: 




MKSAYIFFNPKSGKDEQALAKEVKSYLIEHD?QDDY- VRI ITPSSVEEAVAIAKKASEDH 








MK A I +NP SG++ + K+ + +++ Q Y + +A AK+A+ 




Sbjct: 


1 


MKRARI IYNPTSGRE 1 FKKHDAQVLQKFEQAGYETSTHATTCAGDATHAAKEAALRE 


57 






DL+I GGDGTIN++ G+ " PT+G++P GT N+F++AL IP+E L A + ++N 












Sbjct: 




FDLIIAAGGDGTIl^VVNGLAPLDiJRPTLGVIPVGTTlJDFARALGIPREDILKAADTVIN 




Query: 


119 


GIOTSVDICKVISroDYMISSLTLGLl^lAAWrSEMICRKLGPFAFLGDAYRILKRNRSYS 


178 






G + +DI +VN Y 1+ G L ++ +V S++K LG A+ +L R 




Sb j Ct : 


118 


GVARPIDIGQVNGQYFINIAGGGRLTELTYDVPSKLKTMLGQLAYYLKGMEMLPSLRPTE 


177 


Query: 




ITLAYDNNVRSLRTRLLLITMTNSIAGMPAFSPEATIDDGLFRVYTMEHIHFFKLLLHLR 


238 






+ + YD + L L+T+TNS+ G +P+++++DG+F + ++ 4 + + 




Sbjct: 


178 


VEIEYDGKLFC^EIMLFLVTLTNSVGGFEKIAPDSSLfflDGMFDLMILKKANLAEFIRVAT 


237 


Query: 


239 


QFRKGDFSQAKEIKHFHTNNLTISTFKRKKSAIPKVRIDGDPGDQLPVKVEVIPKALKFI 


298 






+G+ + I + N + ++ ++ ++ +DG+ G LP + + + + + 




Sb j ct : 


238 


MALRGEHINDQHIIYTKANRVKVNVSEKM QLNLDGEYGGM1PGEFVNLYRHIHW 


292 


Query: 


299 


IP 300 




Sb j Ct : 


293 


MP 294 





30 A related DNA sequence was identified in S.pyogenes <SEQ ID 5119> which encodes the amino acid 
sequence <SEQ ID 5120>. Analysis of this protein sequence reveals the following: 
Possible site: 58 

>» Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 .4258 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 172/300 (57%) , Positives = 229/300 (76%) 





1 


MKSAYIFFWPKSGKDEQAIAKEVKSYLIEHDFQDDYVRIITPSSVEEAVALAKKASEDHI 


60 






MK+ IF+NP SGK E LA++VK Y +H F +D V++ITP ++A LAK+A++D I 




Sb j ct : 


1 


MKTWIFYNPNSGKKESQLARQVKDYFCQHGFSEDSVK7ITPKDADQAFQLAKQAAKDKI 


60 




61 


DLVT PLGGDGTINKI CGGVYAGGAYPTI GLVPAGTVNNFSKALNI PQERNLALENLLNGH 


120 






DLVI PLGGDGT+NKI GG+Y GGA+ IGLVP+GTVNNF+KA++IP + AL+ +L G 




Sb j ct : 


61 


DLVIPLGGDGTLNKIIGGIYEGGAHCLIGLVPSGTVHNFAKAMHIPLQITEALDTILTGQ 


120 




121 


VICSVDICKVNDDYMISSLTLGLLADIAANVTSEMKRKLGPFAFLGDAYRILKRNRSYSIT 


180 






+K VDICK N YMISSLTLGLLADIAA+VT+E KR+ GP AFL D+ RILKRNRSY+I+ 




Sb j ct : 


121 


IKQvDICKANQQYMISSLTLGLLADIAADVTAEEKRRFGPLAFLKDSIRILKRNRSYAIS 


180 


Query: 


181 


IAYDNNVRSLRTRLLLITMTNSIAGMPAFSPEATIDDGLFRVYTMEHIHFFKLLLHLRQF 


240 






L N+ L+T+ LLITMTN+IAG P+FSP A DDG F+VYTM+ + FFK L H+ F 




Sbjct: 


131 


LISHNHRIHLKTKFLLITMTNTIAGFPSFSPGAQADH5YFQVYTMKKVSFFKFLWHINDF 


240 


Query: 


241 


RKGDFSQAKEIKHFHTNNLTISTFKRKK3AIPCTRIDGDPGDQLPVKVEVIPKALKFIIP 


30 0 






++GDFS+A+EI HF N L++ +K++ +P+ RIDGD D LP+++++IPKA+ I+P 




Sbjct: 


241 


KQGDFSKAEEISHFQANTLSLLPQAKKQAILPRTRIDGDKSDYLPIQLDI I PKAVS I IVP 


300 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1927 

A DNA sequence (GBSx2036) was identified in S.agalactiae <SEQ ID 5973> which encodes the amino 
acid sequence <SEQ ID 5974>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm -— Certainty=0. 3 6 28 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB10885 GB:AB010693 gene_id:K21C13 .21~pir | | T04769~strong 
similarity to unknown protein [Arabidopsis thaliana] 
Identities = 85/291 (29%) , Positives = 150/291 (51%) , Gaps = 28/291 (9%) 



Query: 58 PSDIGLIFSQEEQVTGAKSLRDIYHLTDPTYQGPYTIPILIDKTDNRIVCKESADL 113 

SD L ++P+ + GAKS+R++Y + P Y+G YT+P+D DK +V ES+++ 
Sbjct: 89 WFPDSDTELPGAEPDYLNGAKSTOELYEIASPNYEGKYWPVLPTOKKLKTVvNNESSEI 148 

Query: 114 LRLFTTDFSDLHQEDAPVLFSQETAStilDNDIKDINKNFQSLMYKIAFLDKQRDYDTYSK 173 

+R+F T+F+ + + + L+ +1+ + + +YK F KQ Y+ 

Sbjct: 149 IRMFLITEFNGIAKTPSLDLYPSHLRDVINETNGWVFNGINNGVYKCGFARKQEPYNEAVN 208 

Query: 174 EFFTFLDQKEHLLGQRPFLLGDNLSEVDIHFFTPLVRWDIAGRDLLLLNQKALEDYPNIF 233 

+ + +D+ E +LG++ ++ G+ +E Dl F L+R+D N++ L +YPNIF 

Sbjct: 209 QLYEAVDRCEEVLGKQRYICGNTFTEADIRIiFVTLIRFDEVYAVHFKCNKRLLREYPNIF 268 

Query: 234 SWAKTL YND FNLKTLTWPQS I KNTJYY LGKFGRAVRHHTIVPTGPNM 279 

++ K +Y + + N + IK +YY + FG I+P GPN+ 

Sbjct: 269 NYIKDIYQIHGMSSTVNMEHIKQHYYGSHPTINPFG IIPHGPNI 312 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1928 

A DNA sequence (GBSx2037) was identified in S.agalactiae <SEQ ID 5975> which encodes the amino 
acid sequence <SEQ ID 5976>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 2647 (Affirmative) < succ: 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07793 GB:AB037666 hypothetical protein [Streptomyces sp. 
CL190] 
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Identities = 127/331 (38%) , Positives = 194/331 (58%) , Gaps = 9/331 (2%) 

Query: 4 RKDDHIKYALKYQSHY NSFDDIELIHSSLPKYNVNDIDLSTHFAGQSFEFPFYINAM 60 

RKDDH++ A++ + + N FDD+ +H +Ii + D+ L+T FAG S++ P YINAM 
Sbjct: 6 RKDDHWLAIEQHNAHSGRNQFDDVSFVHHALAGIDRPDVSIATSFAGISWQVPIYINAM 65 

Query: 61 TGGSEKGKAv^KIAQVAQATGIVM\T?GSYSAALKNDE--DDSYPTTDLYPDLKLATNIG 118 

TGGSEK +N LA A+ TG+ + +GS +A +K+ D D P+ + NI 

Sbjct: 66 TGGSEKTGLINRDLATAARETGVPIASGSMNAYIKDPSCADTFRVLRDENPNGFVIANIN 125 

Query: 119 LDKPVPAAESTVKANMPIFLQVHVNVMQELLMPEGEREFHMWRSHLKEYVDNIQCPLILK 178 

V A+ + + LQ+H+N QE MPEG+R F W +++ + P+I+K 
Sbjct: 126 ATTTVDNAQRAIDLIEANALQIHINTAQETPMPEGDRSFASWVPQIEKIAAAVDIPVIVK 185 

Query: 179 EVGFGMDLQSIKDAYDIGITTVDISGRGC3TSFAYIENQRGR- -DRSYLNTWGQTTAQSLI 236 

EVG G+ Q+I D+G+ D+SGRGGT FA I EN R D ++L+ WGQ+TA L+ 
Sbjct: 186 EVGNGLSRQTILLLADLGVQAADVSGRGGTDFARIENGRRELGDYAFLHGWGQSTAACLL 245 

Query: 237 NAQS^DK^IIASGGIRHPLDMVKCLVLGAKAVGLSRTVLELVERYPVDDVIAILNSWK 296 

+AQ + + +liASGG+RHPLD+V+ L LGA+AVG S L + VD +1 Jj +W 
Sbjct: 246 DAQDI - - SLP VIiASGGVRHPLD VWALALGARAv'GSSAGFLRTLMDDGVDALITKLTTWL 303 

Query: 297 EDLRMIMCALNCKKITDLRQVNYILYGQLKE 327 

+ L+ L + DL + + +L+G+L++ 
Sbjct: 304 DQLAALQTMLGARTPADLTRCDVLLHGELRD 334 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5977> which encodes the amino acid 
sequence <SEQ ID 5978>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2823 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 244/329 (74%) , Positives = 284/329 (86%) 

Query: 1 MTNRKDDHIKYALKYQSHYNSFDDIELIHSSLPKYNVNDIDLSTHFAGQSFEFPFYINAM 60 

MTNRKDDHIKYALKYQS YN+FDDIELIH SIiP Y+++DIDLSTHFAGQ F+FPFYINAM 
Sbjct: 31 MTNRKDDHIKYALKYQSPYNAFDDIELIHHSLPSYDLSDIDLSTHFAGQDFDFPFYINAM 90 

Query: 61 TGGSEKGKAV^KlAQVAQATGIvMVTGSYSAALKNDEDDSYPTTDLYPDLKLATNIGLD 120 

TGGS+KGKAVN KIA+VA ATGIVMVTGSYSAALKN DDSY ++ +LKLATNIGLD 
Sbjct: 91 TGGSQKGKflVNEKLAKVAAATGIVMVTGSYSAaLKNPNDDSYRLHEVADNLKIATNIGLD 150 

Query: 121 KPVPAAESTVKAMNP I FLQVHVNVMQE1LMPEGEREFHKWRSHLKEYVDNI QCPLI LKEV 180 

KPV + TV+ M P+FLQVHVNVMQELLMPEGER FH W+ HIi EY I P+ILKEV 
Sbjct: 151 KPVALGQQTVQEMQPLFLQVHvNVMQELLMPEGERVFHTWKKHLAEYASQIPVPVILKEV' 210 

Query: 181 GFGMDLQSIKDAYDIGITTVDISGRGGTSFAYIENQRGRDRSYLNTWGQTTAQSI.INAQS 240 

GFGMD+ SIK A+D+GI T DISGRGGTSFAYIENQRG DRSYLN WGQTT Q L+NAQ 
Sbjct: 211 GFGMDVNSIICEAHDLGIQTFDISGRGGTSFAYIENQRGGDRSYLNDWGQTTVQCLLNAQG 270 

Query: 241 I^DKMDILASGGIRHPLDMVKCLvTjGAKAVGLSRTVLELvERYPVDDVIAILNSWKEDLR 300 

+MD+++IIASGG+RHPLDM+KC VLGA+AVGLSRTVLELVE+YP + VIAI+N WKE+L+ 
Sbjct: 271 LMDQVEI]^SGGvlUIPLDMIKCFvl I GARaVGLSRTVLELVEICYPTERVIAIWGWKEE 330 

Query: 301 MIMCALNCKKITDLRQVNYILYGQLKEAN 329 

+IMCAL+CK I +L+ V+Y+LYG+L++ N 
Sbjct: 331 IIMCALDCKTIKELKGVDYLLYGRLQQVN 359 



WO 02/34771 



PCT/GB01/04789 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1929 

A DNA sequence (GBSx2038) was identified in S.agalactiae <SEQ ID 5979> which encodes the amino 
acid sequence <SEQ ID 5980>. This protein is predicted to be phosphomevalonate kinase. Analysis of this 
protein sequence reveals the following: 
Possible site: 41 

»> Seems to have no N-terminal signal sequence 



Final Results ----- 

bacterial cytoplasm Certainty=0 . 0785 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG02457 GB:AF290099 phosphomevalonate kinase [Streptococcus pneumoniae] 
Identities = 170/330 (51%) , Positives = 233/330 (70%) , Gaps = 1/330 (0%) 

Query: 1 MVKVQTGGKLYIAGEYAILYPGQVAILKNVPIWAIA7FADNYSLYSDMFNYTASLQPD 60 

M+ V+T GKLY AGEYAIL PGQ+A++K++PIYM A F+D+Y +YSDMF++ L+P+ 
Sbjct: 1 MIAVKTCGKLYWAGEYAILEPGQLALIKDIPIYMRAEIAFSD3YRIYSDMFDFAVDLRPN 60 

Query: 61 KQYSLIQETILLMEEWLINFGKNIKPIHLEITGKLERYGLKFGIGSSGSWVLTIKAMAA 120 

YSLIQETI LM ++L G+N++P L+I GK+ER G KFG+GSSGSVWL +KA+ A 
Sbjct: 61 PDYSLIQETlALMGDFIiAWGQNLRPFSLKICGroiEREG^FGLGSSGSVVVLVVKALLA 120 

Query: 121 LYEIEMPSDLLFKLSAYVTjLKRGDNGSMGDIACIAYEHLISYSAFDRRAVSKMIETKPLE 180 

LY + + +LLFKL++ VLLKRGDNGSMGD+ACI E Ij+ Y +FDR+ + +E + L 
Sbjct: 121 LYNLSVTJQNLLFKLTSAvIjLI<RGDNGSMGDLACIVAEDLVIjYQSFDRQKAAAWLEEENIiA 180 

Query: 181 QVLEAEWGYRITKIQALLEMDFLVGWTMQPSISKEMINIVKSTITQRFLDDTKYQWQLL 240 

VLE +WG+ LE DFLVGWT + ++S M+ +K I Q FL +K W L+ 

Sbjct: 181 TVLERDWGFFISQVKPTLECDFLVGWTKEVAVSSHMVQQIKQNINQNFLSSSKETWSLV 240 



Query: 241 £ 

A +4G E + +E S LL L IYT L++LKEAS+ L V KSSG+GGGDCGI 
Sbjct: 241 EALEQGKAEKVIEQVEVASKLLEGLSTDIYTPIiLRQLKEASQDLQAVAKSSGAGGGDCGI 3 00 

Query: 301 AISFN-KNDNQTLIKRWESAGIELLSKETL 329 

A+SF+ ++ TL RW GIELL +E + ■ 
Sbjct: 301 ALSFDAQSSRNTLKNRWADLGIELLYQERI 330 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5981> which encodes the amino acid 
sequence <SEQ ID 5982>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0. 2669 (Affirmative) • 

bacterial membrane Certainty=0. 0000 (Not Clear) < : 

bacterial outside Certainty=0. 0000 (Not Clear) < : 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 171/325 (52%), Positives = 227/325 (69%), Gaps = 2/325 (0%) 

Query: 4 VQTGGKIjYIAGEYAIIiYPGQVAILKNVPIYMTAI^TFADI^SIiYSDMFNYTASLQPDKQY 63 

VQTGGKLY+ GEYAIL PGQ A++ +P+ MTA + A + L SDMF++ A + PD Y 
Sbjct: 22 VQTGGKLYLTGEYAI LTPGQKALIHF I PLMMTAEISPAAH I QIiASDMFSHKAGMTPDAS Y 81 



WO 02/34771 



PCT/GB01/04789 



Query: 64 SLIQETILLMEEWLINFGKWIKPIHljEITGKLERYGLKFGIGSSGSVWLTIKaMA2ai.YE 123 

+LIQ T+ ++L' ++P L ITGK+ER G KFGIGSSGSV +LT+KA++A Y+. 

Sbjct: 82 ALIQATVKTFADYLGQSIDQLEPFSLIITGKMERDGKKFGIGSSGSVTLLTLKALSAYYQ 141 

Query: 124 IEMPSDLLFKLSAYVLLKRGDNGSMGDIACIAYEHLISYSAFDRRAVSKMIETKPLEQVL 183 

I + +LLFKL+AY LLK+GDNGSMGD IACIAY+ L++Y++FDR VS ++T PL+++L 
Sbjct: 142 ITLTPELLFKIAAYTLLKQGDNGSMGDIACIAYQTLVAYTSFDREQVSNWLQTMPLKKLL 201 

10 Query: 184 EAEWGYRITKIQALLEMDFLVGWTMQPSISKEMINIVKSTITQRFLDDTKYQWQ-I.LSA 242 

+WGY I IQ L DFLVGWT P+IS++MI V ++IT FL T YQ+ Q + A 
Sbjct: 202 VKDWGYHIQVIQPALPCDFLVGWTKIPAISRQMIQQVTASITPAFL-RTSYQLTQSAMVA 260 

Query: 243 FKEGDKEAIKRCLEEISLLLFNLHPSIYTDKLQKLK2ASKGLDIVTKSSGSGGGDCGIAI 3 02 
15 +EG KE +K+ h S LL LHP+IY KL L A + D V KSSGSGGGDCGIA+ 

Sbjct: 261 LQEGHKEELKKSLAGASHLLKELHPAIYHPKLVTLVAACQKQDAVAKSSGSGGGDCGIAIi 320 

Query: 303 SFNKNDNQTLIKRWESAGIELLSKE 327 
+FN++ TLI +W+ A I LL +E 
20 Sbjct: 321 AFNQDARDTLISKWQEADIALLYQE 345 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 1930 

25 A DNA sequence (GBSx2039) was identified in S.agalactiae <SEQ ID 5983> which encodes the amino 
acid sequence <SEQ ID 5984>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.75 Transmembrane 20 - 36 ( 18 - 36) 

30 

Final Results 

bacterial membrane Certainty=0 . 1702 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



40 Example 1931 

A DNA sequence (GBSx2040) was identified in S.agalactiae <SEQ ID 5985> which encodes (he amino 
acid sequence <SEQ ID 5986>. This protein is predicted to be mevalonate diphosphate decarboxylase. 
Analysis of this protein sequence reveals the following: 

Possible site: 25 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — - Certainty=0. 1557 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

50 bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG02456 GB:AF290099 mevalonate diphosphate decarboxylase 
[Streptococcus pneumoniae] 
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Identities = 219/312 (70%) ; Positives = 264/312 (84%) 

Query: 1 MDGKSISVKSYANIAIIKYWGKADAEKMIPATSSISLTLENMYTETRLTALGKDAKKDEF 60 
MD + ++V+SYANIAIIKYWGK ++M+PATSSISLTLENMYTET L+ L + DEF 
5 Sbjct: 1 MDREPVTVRSYANIAI I KYWGKKKEKEMVPATSSISLTIiEI^MYTETTLSPriPANVTADEF 6 0 



Query: 61 YISGVliQITOHEHDKMSAILDRFRQmSGFVKIETTNNMPTAAGLSSSSSGLSALVKACND 120 

YI+G LQN+ EH KMS I+DR+R GFV+I+T NNMPTAAGLSSSSSGLSALVKACN 
Sbjct: 61 YINGQLQNEVEHAKMSKIIDRYRPAGEGFVRIDTQNNMPTAAGLSSSSSGLSALVKACNA 120 

Query: 121 FFGimSQSQmQEAKFASGSSSRSFFGPVAA.WDKDSGDIYKVHTOLDLAMIMLVLM3KR 180 

+F L +SQLAQEAKFASGSSSRSF+GP+ AWDKDSG+IY V T+L LAMIMLVL DK+ 
Sbjct: 121 YFKLGLDRSQLAQEAKFASGSSSRSFYGPLGAWDKDSGEIYPVETDLKIAMIMLVLEDKK 180 

Query: 181 KPISSREGMKICTETSTTFNEWV11QSEQDYQDMLOTLKNNDFQWGQLTERNALAMHSTT 240 

KPISSR+GMK+C ETSTTF++WVRQSE4-DYQDML+YLK NDF K+G+LTE+NALAMH+TT 
Sbjct: 181 KPISSRDGMKLCVETSTTFDDWVRQSEKDYQDMLIYLKEITOFAKIGELTKKMAIAMHATT 240 

Query: 241 KTATPAFSYLTEETYKAMDWKKLREKGHECYYTMDAGPNVKVLCLRQDLEALAAILEKD 300 

KTA+PAFSYLT+ +Y+AM V++LREKG CT+TMDAGPNVKV C +DLE L+ I + 
Sbjct: 241 KTASPAFSYLTDASYEAMA.FVRQLREKGEACYFTMDAGPNVKVFCQEKDLEHLSEIFGQR 3 00 



Query: 301 YRIIVSTTKELA 312 
YR+IVS TK+L+ 
25 Sbjct: 301 YRLIVSKTKDLS 312 



A related DNA sequence was identified in S.pyogmes <SEQ ID 5987> which encodes the amino acid 
sequence <SEQ ID 5988>. Analysis of this protein sequence reveals the following: 
Possible site: 36 

»> Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1271 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 221/313 (70%) , Positives = 258/313 (81%) 





1 


MDGKSISVKSYANIAIIKYWGKADAEKMIPATSSISLTLENMYTETRLTALGKDAKI'CDEF 


60 






+D I+V SYANIAIIKYWGK + KMIP+TSSISLTLENM+T T ++ L A D+F 




Sbjct: 


1 


VDPNVITVTSYANIAIIKYWGKENQAKMIPSTSSISLTLENMFTTTSVSFLPDTATSDQF 


60 




61 


YISGVLQNDHEHDKMSAILDRFRQNRSGWKI3TTNNMPTAAGLSSSSSGLSALVKACND 


120 






YI+G+LQND EH K+SAI+D+FRQ FVK+ET NNMPTAAGLSSSSSGLSALVKAC+ 




Sbjct: 


61 


Y1NGILQNDEEHTKISAIIDQFRQPGQAFVKMETQNNMPTAAGLSSSSSGLSALVKACDQ 


120 




121 


FFGTNLSQSQIAQEAKFASGSSSRSFFGPVAAWDtCDSGDIYKVHTNLDIAMIMLVLNDPCR 


180 






F T L Q LAQ+AKFASGSSSRSFFGPVAAWDKDSG IYKV T+L +AMIMLVLN + 




Sbjct: 


121 


LFDTQLDQKAIAQKAKFASGSSSRSFFGPVAAWDKDSGAIYKVETDLKMAMIMLVLNAAK 


180 




181 


KPISSREGMKICTETSTTFNEWWQSEQDYQDMLVYLKI^FQKVGQLTERNALAMHSTT 


240 






KPISSREGMK+C +TSTTF++WV QS DYQ MD YLK N+F+KVGQLTE NALAMH+TT 




Sb j ct : 


181 


KPISSREGMKLCRDTSTTFDQWVEQSAIDYQHMLTYLKTNNFEKVGQLTEANALAMHATT 


240 




241 


KTATPAFSYLTEETYKAMDWKKLREKGHECiTYTMDAGPN\TCVIjC]jRQDLEAIjAAILEKD 


300 






KTA P FSYLT+E+Y+AM+ VK+LR++G CY+TMDAGPNVKVLCL +DL LA L K+ 




Sb j ct : 


241 


KTANPPFSYLTKESYQAMFAVKELRQEGFACYFTMDAGPNVKVLCLEKDLAQLAERLGKN 


300 




301 


YRIIVSTTKELAD 313 








YRIIVS TK+L D 




Sbjct: 


301 


YRIIVSKTKDLPD 313 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1932 

A DNA sequence (GBSx2041) was identified in S.agalactiae <SEQ ID 5989> which encodes the amino 
acid sequence <SEQ ID 5990>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .1512 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 599 1> which encodes the amino acid 
sequence <SEQ ID 5992>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>=> Seems to have no N-terminal signal 



Final Results 

bacterial cytoplasm Certainty=0 . 1117 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 182/290 (62%), Positives = 223/290 (76%) 

Query: 1 MKEKFGIGKAHSKIILMGEHSVVYGYPAIAIPIjKNIEvTCLIEEAPQLIAI.DMTDPLSTA 60 

MEG GKAHSKI IL+GEH+WYGYPAIA+PL +IEV CI A + + D D LSTA 
Sbjct: 6 MNENIGYGKAHSKIILIGEHAWYGYPAIALPLTDIEWCHIFPADKPLVFDFyDTLSTA 65 

Query: 61 IFAALDY1.GKTSSKIAYHIESQVPERRGMGSSAAVAIAAIRAVFDYFDEDLEADLLECLV 120 

I+A+LDYL + IAY I SQVP++RGMGSSAAV+IAAIRAVF Y EL DLLE LV 
Sbjct: 66 IYASLDYLQRLQEPIAYEIVSQVPQKRGMGSSAAVSIAAIRAVFSYCQEPLSDDIiLEILV 125 

Query: 121 NRAEMIAHSNPSGIiDAKTCLSENTIKFIRNIGFSTVPMHLNAYLVIADTGIHGHTKEAVD 180 

N+AE+ IAH+NPSGLDAKTCLS++ IKFIRNIGF T+ + LN YL+IADTGIHGHT+EAV+ 
Sbjct: 126 NKAEIIAHTNPSGLDAKTCIjSDHAIKFIRNIGFETIEIALNGYLIIADTGIHGHTREAVN 185 

Query: 181 KVKSSGEAVLPFLKELGYLAEASEDAIHKSDSKQLGSLMTKAHQSLKQLGVSSLEADHliV 240 

KV E LP+L +LG L +A E AI++ + +G LMT+AH +LK +GVS +AD LV 
Sbjct: 186 KVAQFEETNLPYLAKLGALTQALERAINQKNKVAIGQLMTQAHSALKAIGVSISKADQLV 245 

Query: 241 EVAI SCGALGAKMSGGGLGGCI I ALVKEKREAERLSQQLEREGAVNTWTE 290 

E A+ GALGAKM+GGGLGGC+ IAL K AE++S +L+ EGAVNTW + 
Sbjct: 246 EAALRAGALGAI<MTGGGLGGCMIAIADTICDMAEKISERLKEEGAVNTWIQ 295 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1933 

A DNA sequence (GBSx2042) was identified in S.agalactiae <SEQ ID 5993> which encodes the amino 
acid sequence <SEQ ID 5994>. This protein is predicted to be a histidine protein kinase. Analysis of this 
protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.43 Transmembrane 12 - 28 ( 4 - 33) 
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Likelihood .- -9.29 Transmembrane 163 - 179 ( 157 - 191) 

Final Results 

bacterial membrane Certainty=0 .6371 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF79919 GB:AF039082 putative histidine protein kinase 
[Lactococcus lactis] 
Identities = 78/315 (24%) , Positives = 154/315 (48%) , Gaps = 33/315 (10%) 



++W+ ++++F + VI+S+ L+ + P +A YEKQ+ F+ NA HEL+TPLAI + 







101 


15 


Sbjct: 


84 




Query: 


161 


20 


Sbjct: 


143 






217 




Sbjct: 


203 


25 


Query: 


262 




Sbjct: 


263 


30 




322 




Sbjct: 


312 






382 


35 


Sbjct: 


370 



L + +++LAR 



+ +T++A + F G + +G 



G ++V + ++ V++ + DDK F+RF+R D++ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 575 1> which encodes the amino acid 
sequence <SEQ ID 5752>. Analysis of this protein sequence reveals the following: 
Possible site: 24 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.30 Transmembrane 18 - 34 ( 13 - 42) 
INTEGRAL Likelihood =-10.35 Transmembrane 170 - 186 ( 163 - 199) 



Final Results 

bacterial membrane Certainty=0 . 5522 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 233/410 (56%) , Positives = 303/410 (73%) , Gaps = 1/410 (0%) 

Query: 1 MFRNLRLRFIGIAALAILVVLFSWGVLNSANKYQTI^IYRVLTILADNNGRIPNKLEF 60 
MF +R+RFI IA++AI ++L S+VG++N+A YQ++ EI R+L +++ N G++P E 
. Sbjct: 10 MFNRIRIRFIMIASIAIFIILSSIVGIINTARCYQSQQEINRILHLISSNKGKLPGTTES 69 

Query: 61 SKELGDDLSTDAIFQFRYFSARTDAKGNVTSFDSRNIFEVSDRQIKNYAKRIVSQNSHSG 120 

SK LG LS D++ QFRY+S +A G++ S ++ NI + + + +A+ G 
Sbjct: 70 SKRLGTKLSEDSLSQFRYYSVIFNANGHLLSSNTANISALDREEAQYFARLFAKSGEEKG 129 

Query: 121 HITYNFSTYSYDLKKVGKNDYLWFLDTTNQYLDNQRLLQLSIWMSLVSFIVFMVIVSVL 180 

+ S YSYL+ ++ + LW LDTT + LL +S+ ++ FI F+V+VS+ 

Sbjct: 130 SYRHQDSVYSYLITQLPNEEKLWILDTTFYFRSVGDLLAVSVMLAFGGFIFFWLVSLF 189 
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SG VI PFV NYEKQRRFITNAGHELKTPLAI I SANNELVE+M+GESEWTKST+DQ++RL 
Sbjct: 190 SCSIWIKPFVQI^KQRRFITNAGHELKTPLAIISM^IjVEIjMTGESEWTKSTSDQVKRL 249 

Query: 241 TGLINGMVSIARPEEQPDISMVDLDFSHITKDAAEDFKGPIIKDGKDFIMSIQPGIHVKA 300 

TGLIN M++LAR EEQPD+ + +DFS I +DAAEDFK ++KDGK F ++IQP I +KA 
Sbjct: 250 TGLINQMITIiARLEEQPDWLHIIVDFSAIAQDAAEDFKSLVLKDGKRFDLTIQPNIMIKA 309 

Query: 301 EEKSLFELVTLLVDNANKYCDPMGTVTVKLSRSSRLR-RAKLEVSNTYKNGKDIDYSKFF 359 

EEKSLFELVT+LVDNANKYCDP G V V L+ R R RAKLEVSNTY GK IDYS+FF 
Sbjct: 310 EEKSLFELVTILVDNANKYCDPKGLVKVSLTTIGRRRKRAKLEVSHTYLEGKSIDYSRFF 369 

Query: 3 60 ERFYREDESHNNKKSGYGIGLSIVTSLVHLFKGSIDVNYKHDTITFVIYI 409 

ERFYREDESHN+K+ GYGIGLS+ S+V LFKG+I VNYK+D I F + I 
Sbjct: 370 ERFYREDESHNSKEKGYGIGLSMAESMVKLFKGTITVNYKNDAIVFTVVI 419 

SEQ ID 5994 (GBS273) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 14; MW 46kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 56 (lane 5; MW 71kDa). 

GBS273-GST was purified as shown in Figure 208, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1934 

A DNA sequence (GBSx2043) was identified in S.agalactiae <SEQ ID 5995> which encodes the amino 
acid sequence <SEQ ID 5996>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm --- Certainty=0. 2181 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1935 

A DNA sequence (GBSx2044) was identified in S.agalactiae <SEQ ID 5997> which encodes the amino 
acid sequence <SEQ ID 5998>. This protein is predicted to be two-component response regulator (trcR). 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 .2503 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9379> which encodes amino acid sequence <SEQ ID 9380> 
50 was also identified. 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04091 GB:AP001508 two- component response regulator [Bacillus halodurans] 
Identities = 71/183 (38%) , Positives = 120/183 {64%} , Gaps = 3/183 (1%) 



Query: 


9 


RVLIAEDEEQMSRVLSTAlSHO^YVVDVAyDGQTAIDl^QNAYDWWMDVMMPVKTGIE 


68 






R+Ll EDE+++4RVL + H+GY D it G ++ +A+D++++DVM+P +G+E 




Sbjct: 


3 


RILIIEDEKKIARVLQLELEHEGYETDAAFSGSDGLSTFQAHAWDLVLLDVMLPELSGLE 


62 


Query: 


69 


AVKEIRQSGNKSH1 IMLTAMAElDDRVTGLDAGADDYLTKPFSLKELIARLRSMSRRIiE - 


127 






++ IR + + II+LTA I D+V+GLD GA+DY+TKPF 4+ELIAR+R+ R ++ 




Sbjct: 


63 


VLRRrWITDPVTPllBLTARNSIPDKVSGLDLGSNDYITKPFEIESLLARVRACLRTVQT 


122 


Query: 


128 


-DFTPOTLSLGROTLSVGEQELQCEN-TIRIAGKEAKMLAFFMLKHDKELSTQQLFEHVW 


185 






+ + L ++4Q H TI 1 KE ++L PF+ H + LS +Q+ +VW 




Sbjct: 


123 


RERVEDTMFQELTI^KTRDVQRGlffiTIEr.TPKEFELLWFIKKKGQ'/LSREQILTNVW 


182 




186 


GAD 188 
G D 




Sbjct: 


183 


GFD 185 





20 

A related DNA sequence was identified in S.pyogenes <SEQ ID 5999> which encodes the amino acid 
sequence <SEQ ID 6000>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2391 (Affirmative) < suec> 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 125/185 (67%) , Positives - 151/185 (81%) 



Query: 8 MRVLIAEDEEQMSRVLSTAISHQGYVVI)\akYDG^^ 67 
35 M++L+AEDE QMS VL+TA4+HQGY VDV ++GQ AID A NAYD+M++D+MMP+K+GI 

Sbjct: 1 MKILI^DEWQMSNVLTTAMTHQGTO 60 

Query: 68 EAVKEIRQSGNKSHIIMLTTAMAEIDDRvTGLDAGAEiDYLTKPFSLKELLARLRSMSRRLE 127 
EA+KEIR SGN SHIIMLTAMAEI+DRVTGLDAGADDYLTKPFSBKELLARLRSM RR+E 
40 Sbjct: 61 EALKEIRASGNCSHIIMLTAI^IITORVTGLDAGADDYLTKPFSLKELIARLRSMERRVE 120 

Query: 128 DFTPNVLSLGRVTLSVGEQELQCENTIRIAGKEAKMIAFFMI^MDKELSTQQLFEHVWGA 187 
ftp vr, vrr..t.+ v.nm, w twt.» ww. kj.j-qi? mt.m «■ t. t_l t.j ..ait™ 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 17 LEDFSQRIQLENDKAKVETGYKLYEHIIGRIKTSDSMIEKCRRKQLPVTVDSALKTIRDS 76 

L++ + +1 + + + Y EH+ R+K+ +S++ K +R+ T++S + +RD 
Sbjct: 29 LQELNTKIDILKQEFQYIHDYNPIEHVSSRVKSPESIVNKIQRRGNDFTLESIRENVRDI 88 

Query: 77 IGWIICGFVNDIYQIIERIKAFDDCRIWEKDYIQHVKPNGYRSYHVILEIDTPYPDCL 136 

G+RI C F +DIY + E++ D +V KDYI++ KPNGYRS H+IL I P + 
Sbjct: 89 AGIRITCSFESDIYTLSEQLMQQHDISWETKDYIKNPKPNGYRSLHLILSI PIFM 144 

Query: 137 GNSDGKYYIEIQLRTIAQDSWASLEHQMKYKHDIENPERIVRELKRCADEMASVDLTMQT 196 

+ Y+E+Q+RTIA D WASLEH++ YK++ PE +++ELK A+ A +D M+ 

Sbjct: 14S SDRVQDVYVEVQIRTIAMDFWASLEHKIYYKYNKNVPEHLLKELKDAAESAALLDQKMEK 204 

Query: 197 IR 198 
1+ 

Sbjct: 205 IQ 206 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6003> which encodes the amino acid 
sequence <SEQ ID 6004>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm — Certainty=0 . 1057 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 127/206 (61%), Positives = 162/206 (77%) 

Query: 3 TNIYGDYGRYLPLILEDFSQRIQLENDKAKVETGYKLYEHIIGRIKTSDSMIEKCRRKQL 62 
++IY + YLPL4-L+ 4- I EN K+K ETG+KLYEH RIK+ SMIEKC+RKQL 

Query: 63 PVTVDSALKTIRDSIGVRIICGFVNDIYQIIERIKAFDDCRIVVEKDYIQHVKPNGYRSY 122 

P+T SALK I+DSIG+RIICGF++DIY++++ +K+ + EKDYI + KPNGYRSY 

Sbjct: 71 PLTSKSALKIIKDSIGIRIICGFIDDIYRMVDLLKSIPGMSVNTEKDYILNAKPNGYRSY 130 

Query: 123 HVILEIDTPYPDCLGNSDGKYYIEIQLRTIAQDSWASLEHQMKYKHDIENPERIVRELKR 182 

H+ILE++T +PD LG G Y+IE+QLRTIAQDSWASLEHQMKYKH + N E I RELKR 
Sbjct: 131 HLILELETHFPDILGEKKGCYFIEVQLRTIAQDSWASLEHQMKYKHQVANAEMITRELKR 190 

Query: 183 CADEMASVDLTMQTIRQLIESGTKKE 208 

CADE+AS D+TMQTIRQLI+ T++E 
Sbjct: 191 CADELASCDVTMQTIRQLIQETTEEE 216 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1937 

A DNA sequence (GBSx2046) was identified in S.agalactiae <SEQ ID 6005> which encodes the amino 
acid sequence <SEQ ID 6006>. Analysis of this protein sequence reveals the following: 
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Final Results • 

bacterial cytoplasm Certainty=0. 3250 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA37193 GB:X53013 ORF1 (AA 1 - 384) [LactococcuS lactis] 
Identities = 30/55 (54%) , Positives = 37/55 (66%) 

10 

Query: 1 MEFYYKTLKRKFINDaDTIFIEQSQFEIFIYIETDHNSSSSHWLDYQSQKEFEK 55 

ME +YKTLKR+ INDA ++ EIF YIET +N+ H LDYQS K+FEK 

Sbjct: 327 MESFYKTLiCTlELiraDMFETRAEATQEIFKYIETYYM'KWMHSGLDYQSPKDFEK 381 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 6007> which encodes the amino acid 
sequence <SEQ ID 6008>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have no N- terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0. 3 OSS (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 31/59 (52%) , Positives = 39/59 (65%) 

Query: 1 MEFYYKTLKRKFINDADTIFIEQSQFEIFIYIETDHNSSSSHWLDYQSQKEFEKIITN 59 
ME +YKTLKR+ +NDA I+Q+Q EIF Y ET +N H L Y S EFEKI+T+ 
30 Sbjct: 13 MEaFYKTLKRELvOTJAHFATIKCAQLEIFKYSETYYNPKRLHSALCTLSPVEFEKIVTH 71 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1938 

35 A DNA sequence (GBSx2047) was identified in S.agalactiae <SEQ ID 6009> which encodes the amino 
acid sequence <SEQ ID 6010>. This protein is predicted to be R5 protein. Analysis of this protein sequence 
reveals the following: 

Possible site: 51 

»> Seems to have no N-terrainal signal sequence 
40 INTEGRAL Likelihood = -3.98 Transmembrane 30 - 46 ( 29 - 51) 

INTEGRAL Likelihood = -2.76 Transmembrane 967 - 983 ( 966 - 9B5) 

Final Results 

bacterial membrane Certainty=0. 2593 (Affirmative) < suco 

45 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8935> which encodes amino acid sequence <SEQ ID 8936> 
was also identified. Analysis of this protein sequence reveals the following: 

50 Lipop: Possible site: -1 Crend: 8 

SRCFLG: 0 

McG: Length of OR: 2 

Peak Value of UR: 2.44 

Net Charge of CR: 2 
55 McG: Discrim Score: 0.78 

GvH: Signal Score (-7.5): -0.0599995 



WO 02/34771 



-2181- 



PCT/GB01/04789 



Possible site: 39 
>>> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 40 
ALOM program count: 0 value: 7.37 threshold: 0.0 
5 ' PERIPHERAL Likelihood = 7.37. 194 

modified ALOM score: -1.97 



*** Reasoning Step: 3 

10 Rule gpol 

Pinal Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

LPXTG motif: 944-948 

No corresponding DNA sequence was identified in S.pyogenes. 

20 SEQ ID 8936 (GBS200) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 3; MW 107.4kDa), in Figure 169 (lane 4; MW 122kDa) and in Figure 
238 (lane 11; MW 122kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE analysis 
of total cell extract is shown in Figure 35 (lane 3; MW 132kDa). 

Purified Thio-GBS200-His is shown in Figure 244, lane 9. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1939 

A DNA sequence (GBSx2048) was identified in S.agalactiae <SEQ ID 601 1> which encodes the amino 
acid sequence <SEQ ID 6012>. This protein is predicted to be a 16.1 kDa transcriptional regulator. Analysis 
30 of this protein sequence reveals the following: 

Possible site: 25 

■>» Seems to have no N- terminal signal sequence 

Final Results 

35 bacterial cytoplasm --- Certainty=0 .3919 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9953> which encodes amino acid sequence <SEQ ID 9954> 
40 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB16108 GB:Z99124 similar to transcriptional regulator (MarR 
family) [Bacillus subtilis] 
Identities = 30/114 (26%) , Positives = 59/114 (51%) , Gaps = 3/114 (2%) 

45 

Query: 29 DVEHLAGPQGHLVMYLYKHPDKDMSI KAVEE I LHI SKS VASNLVKRMEKNGFIAIVPSKT 88 

D++ G +LV +Y++P + + + E++ +• ++ A+ +K++E GFI +P + 
Sbjct: 25 DLDLTRGQYLYLW-IYENPG--IIQEKIAEMIKVDRTTAARAIKKLEMQGFIQKLPDEQ 81 

50 Query: 89 DKRVKYLYLTHLGKKKATQFEIFLEKLHSTMIAGITKEEIRTTKKVIRTLAKNM 142 

+K++K L+ T GKK E L+G T EE T ++ + KN+ 

Sbjct: 82 NKKIKKLFPTEKGKKVYPLLRREGEHSTEVALSGFTSEEKETISALLHRVRKNI 135 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6013> which encodes the amino a 
sequence <SEQ ID 6014>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4175 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities =27/64 (42%), Positives = 46/64 (71%) 

Query: 3 MENPLQKARILWQLEKYLDHYAKEYDVEHLAGPQGHLVMYLYKHPDKDMSIKAVEEILH 62 

M + R L++Q+E+ D AK+YDVEHLAGPQG+ ++++L KH ++++ +K +E+ L 
Sbjct: 1 MSQVIGDLRELIHQIEQISDEIAKKYDVEHIAGPQGYVLVFLAKHQNQEIFVKDIEKQLR 60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1940 

A DNA sequence (GBSx2049) was identified in S.agalactiae <SEQ ID 6015> which encodes the amino 
acid sequence <SEQ ID 6016>. This protein is predicted to be 5' -nucleotidase family protein. Analysis of 
this protein sequence reveals the following: 
Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 

Likelihood = -2.66 Transmembrane 668 - 684 ( 665 - 684) 



Final Results 

bacterial membrane Certainty=0. 2062 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12747 GB:Z99108 similar to 5 ' -nucleotidase [Bacillus subtilis] 
Identities = 178/535 (33%) , Positives = 270/535 (50%) , Gaps = 55/535 (10%) 

- -AYMDDAQKDFKQTNPNG 84 



Query: 


28 


Sbjct: 


586 




85 


Sbjct: 


640 




145 


Sbjct: 


698 




205 


Sb j Ct : 


745 


Query: 





• K A+EL+ K VKAI VLAH+ A 
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Sbjct: 


805 


Query: 


325 


Sbjct: 


860 . 


Query: 


385 








445 


Sbjct: 


969 




505 


Sb j ct : 


1022 



L +EI G+DL + +N Q I+G +TYT +KE G+ 



I PDA Y L +N+F+ A ++ LLG NP 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1607> which encodes the amino acid 
sequence <SEQ ID 1608>. Analysis of this protein sequence reveals the following: 



Possible site: 40 

»> Seems to have no N- terminal signal sequence 
INTEGRAL Likelihood = -4.67 Transmembrane 
INTEGRAL Likelihood = -2 . 02 Transmembrane 



- Certainty=0. 2869 (Affirmative) 



bacterial outside — Certainty=0 . 0000 {Not Clear) 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 415/683 (60%) , Positives = 517/688 (74%) , Gaps = 21/688 (3%) 

Query: 1 MKKKIILKSSVLGLVAGTSIMFSSVFADQVGVQVIGVNDFHGALDNTGTANMPDGKVANA 60 

MKK ILKSSVL ++ +++ + V ADQV VQ +GVNDFHGALDNTGTA P GK+ NA 
Sbjct: 14 MKKYFILKSSVLSILTSFTLLVTDVQADQVDVQFLGVNDFHGALDNTGTAYTPSGKIPNA 73 

Query: 61 GTAAQLDAYTC1DAQKDFKQTNPNGESIRVQAGDMVGASPANSGLLQDEPTVKNFNAMNVE 120 
GTAAQL AYMDDA+ DFKQ N +G SIRVQAGDMVGASPANS LLQDEPTVK FN M E 

Query: 121 YGTLGNHEFDEGIAEYNRIVTGI^APAPDSNINNITKSYPHEAAKQEIWANVIDKVNKQI 180 

YGTLGNHEFDEGL E+NRI +TG+AP P+S IN+ITK Y HEA+ Q IV+ANVIDK K I 
Sbjct: 134 YGTLGNHEFDEGLDEFNRIMTGQAPDPESTINDITKQYEHEASHQTIVIANVIDKKTKDI 193 

Query: 181 PYNWKPYAIKNIPVNNKSVNVGFIGIVTKDIPI^VIjRKNYEQYEFLDEAETIVKYAKELQ 240 

PY WKPYAIK+I +N+K V +GFIG+VT +IPNLVL++NYE Y+FLD AETI KYAKELQ 
Sbjct: 194 PYGWKPYAIKDIAINDKIVKIGFIGWTTEIPNLVLKQNYEHYQFLDVAETIAKYAKELQ 253 

Query: 241 AKKTVKAI VVIiAHVPATSKND I AEGEAAEMMKJOnjQLFPENS VD I VFAGHNHQYTNGLVGK 300 

++V AIWLAHVPATSK+ + + E A +M+KVNQ++PE+S+DI+FAGHNHQYTNG +GK 
Sbjct: 254 EQHVHAIVVIiAHVPATSI<DGVVDHEMATVMEKVNQIYPEHSIDIIFAGHNHQYTNGTIGK 313 

Query: 301 TRIVQALSQGKAYADTOGJVLDTDTQDFIETPSAKVIAVAPGKKTGSADIQAIVDQANTIV 360 

TRIVQALSQGKAYADVRG LDTDT DFI+TPSA V+AVAPG KT ++DI+AI++ AN IV 
Sbjct: 314 TRIVQALSQGKAYADVRGTLDTDTNDFIKTPSANWAVAPGIKTENSDIKAIINHANDIV 373 

Query: 361 KQVTEAKIGTAEVSVMITRSVDQDWSPVGSLITEAQIAIARKSWPDIDFAMTNNGGIRA. 420 

K VTE KIGTA S I+++ + D SPVG+L T AQL IA+K++P +DFAMTNNGGIR+ 
Sbjct: 374 KTVTERKIGTATNSSTISKTFJ^IDKESPVGNIATTAQLTIAKKTFPTVDFAMTNNGGIRS 433 

Query: 421 DLLIKPDGTITWGARQAVQPFGNILQWEITGRDLYPCALNEQYDQKQNFFLQIAGLRYTY 480 

DL++K D TITWGAAQAVQPFGNILQV+++TG+ +Y LN+QYD+ Q +FLQ++GL YTY 
Sbjct: 434 DLVVKNDRTITWGAAQAVQPFGNII^VIQIvrraQHIYDVLNQQYDENQTYFLQMSGLTYTY 493 
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Query: 481 TDNKEGGEETPFKWKAYKSNG3EINPDAKTKLVINDFLFGGGDGFASFRNAKLLGAINP 540 

TDN +TPFK+VK YK NGEEIN Y +V+NDFL+GGGDGF++F+ AKL+GAIN 

Sbjct: 494 TDITOPKKSDTPFKIVKVYKDNGEEI^TTTYTVVVNDFLYGGGDGFSAFKKAKLIGAINT 553 

Query: 541 DTEVFMAYITDLEKAGKKVSVPNNKPKIYVTMKMVNETITQ1JDGTHSIIKKLYLDRQGNI 600 

DTE F+ YIT+LE +GK V+ K YVT + + T + G HSII K++ +R GN 

Sbjct: 554 DTEAFITYimLEASGKTWATIKGVKNYVTSNLESSTKTOISAGKHSIISKVFRNRDGNT 613 

Query: 601 VAQEIVSDTIMQTKSKSTKINPVTTIHKKQLHQFTAINPMRNYGKPSNSTTVKSKQLPKT 660 

V+ E++SD L T++ + + T +N T+ S LP T 

Sbjct: 614 VSSEVISDLLTSTENTNNSLGKKET TTNKNTISSSTLPIT 653 

Query: 661 NSEYGQSFLMSVFG-VGLIGIALNTKKK 687 

Y S +M++ + L G+ KK+ 
Sbjct: 654 GDNYKMSPIMTILALISLGGLNAFIKKR 681 

SEQ ID 6016 (GBS328) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 69 (lane 4; MW 73kDa). The GBS328-His fusion product was purified (Figure 
213, lane 9) and used to immunise mice. The resulting antiserum was used for FACS (Figure 268), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1941 

A DNA sequence (GBSx2050) was identified in S.agalactiae <SEQ ID 6017> which encodes the amino 
acid sequence <SEQ ID 6018>. This protein is predicted to be peptide deformylase (def-2). Analysis of this 
protein sequence reveals the following: 
Possible site: 21 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.70 Transmembrane 55 - 71 ( 55 - 74) 

Final Results 

bacterial membrane Certainty=0 . 1680 (Affirmative) c suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB09662 GB:Z96934 peptide deformylase [Clostridium 
bei j erinckii] 

Identities = 71/136 (52%) , Positives = 96/136 (70%) 

Query: 1 MIKPIVRDTFFLQQKSQMASRADVSLAKDLQETLHANQNYCVGMAANMIGSLKRVIIINV 60 

MIKPIV+D FL QKS+ A++ D+ + DL +TL AN +CVG+AANMIG KR+++ V 
Sbjct: 1 MIKPIVKDILFLGQKSEEATKNDMWIDDLIDTLRAOT^EHCVGI^AAM^IGViaailLVFTV 60 

Query: 61 GITNLVMFNPWVAKSDPYETEESCLSLVGCRSTQRYCHITISYRDINWKEQQIKLTDFP 120 

G + M NPV++ K PYETEESCLSL+G R T+RY I ++Y D N+ +++ F 
Sbjct: 61 GNLIVPMINPVILKKEKPYETEESCLSIjIGFRKTKRYETIEVTYLDRNFNKKKQVFNGFT 120 

Query: 121 AQICQHELDHLEGILI 136 

AQI QHE+DH EGI+I 
Sbjct: 121 AQIIQHEMDHFEGIII 136 

A related DNA sequence was identified in S.pyogenes <SEQ ID 601 9> which encodes the amino acid 
sequence <SEQ ID 6020>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 
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INTEGRAL 



Likelihood '= -3 . i 



Transmembrane 55 - 71 ( 55 - 73) 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 2444 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
■- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 77/136 (56%) , Positives = 103/136 (75%) 

Query: 1 MIKPIVRDTFFLQQKSQMASRADVSLAKDLQETLHANQNYCVGMAANMItSSLKRVIIIW 60 

MI+ 1+ D F LQQK+Q+A + D+ + +DLQ+TL + C+GMAANMIG KR++I+++ 
Sbjct: 1 MIE^lITDHFLLQQKAQVAiaCEDLWIGQDLQDTIAFYRQECLGMAAM4IC3EQKRIVIVSM 60 

Query: 61 GITNLVMFNPVWAKSDPYETEESCLSLVGCRSTQRYCHITISYRDINWKEQQIKLTDFP 120 

G +LVMFNPV+V+K Y+T+ESCLSL G R TORY IT+ Y D NW+ +++ LT 
Sbjct: 61 GFIDLVMFNPVMVSKKGIYQTKESCLSLSGYRKTQRYDKITVEYLDHNWRPKRLSLTGLT 120 

Query: 121 AQICQHELDHLEGILI 136 

AQICQHELDHLEGILI 
Sbjct: 121 AQICQHELDHLEGILI 136 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1942 

A DNA sequence (GBSx2051) was identified in S.agalactiae <SEQ ID 602 1> which encodes the amino 
acid sequence <SEQ ID 6022>. Analysis of this protein sequence reveals the following: 
Possible site: 28 

>>> Seems to have no N-terminal signal sequence 
Final Results 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05820 GB:AP001514 NADP-specif ic glutamate dehydrogenase 
[Bacillus halodurans] 
Identities = 298/444 (67%) , Positives = 362/444 (81%) , Gaps = 2/444 (0%) 

Query: 7 YVASVLEKVKKQNEHEEEFLQAVEEVFESLVPVFDKYPQYIEENLLERLVEPERVISFRV 66 

YV V E VK++N +E EF QAV+EVF+SL+PV K+PQY+++ +LER+VEPERVISFRV 
Sbjct: 16 YVQHVYETVKRRNPNEHEFHQftVKEVFDSLLPVLVKHPQYVKQAILERIVEPERVISFRV 75 

Query: 67 PWVDDKGQVQVNRGYRVQFSSAIGPYKGGLRFHPTVTQSIVKFLGFEQIFKNSLTGLPIG 126 

PWVDD+G VQVNRG+RVQF+SA+GPYKGGLRFHP+V SI+KFLGFEQIFKN+LTG PIG 
Sbjct: 76 PWVDDQGNVQVNRGFRVQFNSALGPYKGGLRFHPSVNASIIKFLGFEQIFKNALTGQPIG 135 

Query: 127 GGKGGSNFDPKGKSDNEVMRFTQSFMTELQKYIGPDLDVPAGDIGVGGREIGYLYGQYKR 186 

GGKGGS+FDPKGKSD E+MRF+QSFM+EL YIGPD+DVPAGDIGVG +EIGY++GQYK+ 
Sbjct: 136 GGKGGSDFDPKGKSDGEIMRFSQSFMSELSNYIGPDIDVPAGDIGVGAKEIGYMFGQYKK 195 

Query: 187 L-NGYQNGVLTGKGLTYGGSLARTEATGYGAWFAKEMLAARGQDLTGKVALVSGSGNVA 245 

+ G++ GVLTGKG+ YGGSLAR EATGYG VYF +EM+ G G +VSGSGNV+- 
Sbjct: 196 MRGGFFAGVLTGKGIGYGGSLARKFATGYGTVYFVEEMIKDHGFSFAGSTVWSGSGNVS 255 

Query: 246 IYATEKLQELGATWAVSDSSGYVYDPDGIDLETLKQIKEVERARIVKYTEKHPKANFTP 305 

IYA EK +LGA WA SDS GYVYD +GIDL+T+K++KEVER RI +Y +HP A++ 
Sbjct: 255 IYAIffiKAMQLGAKVraCSDSGGYVYDKNGIDLQT^ 315 



bacterial cytoplasm --- Certainty=0 . 2880 (Affirmative) 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < 




Query: 306 ADQGS IWS I KADLAFPCATQMELDEEinftKLLVEMGVIAVTEGANMPSTLGAI KVFQKAGV 365 
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G IWS+ D+A PCATQNELDE A +L+ NGV AV EGANMPSTL A+ FQ+ GV 
Sbjct: 316 GCSGrlWSVPCDIALPCATQlffiLDEAAATMLIANGVKaVGEGANMPSTLQAVHTFQEHGV 374 

Query: 365 AFGPAKAANAGGVAVSALEMAQNSSRI^Vn'FEEVDQELQRIMICTIFWASEAADEFGDSG 425 

F PAKAANAGGV+VSALEMAQNS+R AWTFEEVD +L IMK 1+ + +AA4 + SG 
Sbjct: 375 LFAPAKAANAGGVSVSAIiEMAQNSTRLAWTFEEVDAKLYEIMKNIYRESIKAAELYEASG 434 

Query: 426 NLVLGANIAGFLKVAQAMSAQGIV 449 

NLV+GANIAGF+KVA AM + G+V 
Sbjct: 435 NLWGANIAGFVKVADAMISHGW 458 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or 



Example 1943 

A DNA sequence (GBSx2052) was identified in S.agalactiae <SEQ ID 6023> which encodes the amino 
acid sequence <SEQ ID 6024>. Analysis of this protein sequence reveals the following: 

Possible site: 58 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



ave no N-terminal £ 
Likelihood = - 



Likelihood = -7 
Likelihood = -5 
Likelihood = -3 
Likelihood = -2 



gnal sequence 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 4418 (Affirmative) < e 

• Certainty=0. 0000 (Not Clear) < sue 

• Certainty=0. 0000 (Not Clear) < sue 



A related GBS nucleic acid sequence <SEQ ID 9955> which encodes amino acid sequence <SEQ ID 995 6> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 1944 

A DNA sequence (GBSx2053) was identified in S.agalactiae <SEQ ID 6025> which encodes the amino 
acid sequence <SEQ ID 6026>. This protein is predicted to be ABC transporter, ATP-binding protein 
(msbA). Analysis of this protein sequence reveals the following: 

Possible site: 37 



Seems to have a cleavable N-te 


-m signal seq. 










INTEGRAL Likelihood =-10.72 


Transmembrane 


152 




147 


- 192 


INTEGRAL Likelihood = -5.47 


Transmembrane 


267 ' 


283 


264 


- 288 


INTEGRAL Likelihood = -4.30 


Transmembrane 


171 


187 


169 


- 192 


INTEGRAL Likelihood = -2.13 


Transmembrane 


67 


83 


67 


- 83 


INTEGRAL Likelihood = -0.32 


Transmembrane 


493 


509 


493 


- 509 



- Final Results 

bacterial membrane - 
bacterial outside - 



•- Certainty=0 . 5288 (Affirmative) ■ 
•- Certainty=0 . 0000 (Not Clear) < I 
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bacterial cytoplasm Certainty=0 .0000 {Not Clear)' < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB69752 GB:AL137187 putative ABC transporter [Streptomyces coelicolor A3 (2)] 
Identities = 269/611 (44%) , Positives = 392/611 (64%), Gaps = 31/611 (5%) 

RLWSYLTRYKATLFLAlFLKVLSSFMSILEPFILGIiAITELTANLV--DMAKG -- 59 

RL S +ATI.F + V+S ++++ P ILG A + A +V DM G 

RLVSQFRPERATLFTLLACVWSVGLNWGPKI LGRATDLVFAGIVGRDMPSGATKEQVL 8 6 

- - VSGAELNVPYIAGILI IYFFRGVFYELGSYGSNYFMTTW 9 9 

V G 4+ 4 +L++ L + + V 

ATMREHGDGNVADMLRSTDFVPGQGIDFGAVGEVLLLAIATFAVAGLLMAVATRLVNRAV 146 

QKSIRDIRHDLNRKINKVPVSYFDKHQFGDMLGRFTSDVETVSNALQQSFLQIINAFLSI 159 
+4+ 4R D4 K444+P4SYFDK Q G44L R T4D44 4 LQQS Q4IN4 L+I 



Query: 


9 


Sbjct: 


27 


Query: 
Sbjct: 


60 
87 


Query: 


100 


Sbjct: 


147 


Query: 


16 0 


Sbjct: 


207 


Query: 


220 


Sbjct: 


267 


Query: 


280 


Sbjct: 


327 


Query:. 


339 


Sbjct: 


387 




399 


Sbjct: 


447 




458 


Sbjct: 


507 




518 


Sbjct: 


567 




578 


Sbjct: 


627 



I V4 M4 Y44 LA++ H 



44K44GR4E S44 



r KRSQP F 4Q 4 



G4LN 44E TG 



I F SGIM P44 4S4 Y444A VGGL4 



V 4G L4IG44QAF4QY QS Pt 4 4A 44QS 4S ER4FE4LD E 



V f y p + kpi,;i 1 



4AIVGPTGAGKTTL4NLLMRF 



Y4VS G IT4DG DI 4SR 1 



AA+AA4 D F4RTLP C 



GMVLQD WL4 GTI EN4 4G 4 



- 4S G4KQL4TIARA L4DP IL4LDEATSSVD 



TR E4LIQKAM KL GRTSFVIAHRLSTI44AD ILV44DG I4EQG H 4LL G Y 



There is also homology to SEQ IDs 160 and 6546. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1945 

A DNA sequence (GBSx2054) was identified in S.agalactiae <SEQ ID 6027> which encodes the amino 
acid sequence <SEQ ID 6028>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-10.88 Transmembrane 242 - 258 ( 235 - 263) 
Likelihood = -9.82 Transmembrane 159 - 175 ( 129 - 177) 
Likelihood = -9.71 Transmembrane 52 - 68 ( 49 - 77) 
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Final Results 

bacterial membrane Certainty=0 . 5352 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB69751 GB:AL137187 putative ABC transporter [Streptomyces 
coelicolor A3 (2)] 

Identities = 22S/565 (40%), Positives = 342/565 (60%), Gaps = 1/565 (0%) 

Query: 6 SYLKRYPNWLWLDLLGAMLFVWILGMPTAIAGMIDNGOTKGDRTGVYLWTFIMFIFVVL 65 

+YL+ Y + L 4- L L 4-PT A +ID GV KGD + + +M + 

Sbjct: 8 TYLRPYKKPIALLVALQFLQTCASLYLPTLNAHIIDEGWKGaSGYILSYGALMIGISIA 67 

Query: 66 GIIGRITMAYASSRLTTTMIRDMRNDIWAKLQEYSHHEYEQIGVSSLVTRMTSDTFVLMQ 125 

++ I + +R + RD+R ++ ++Q +S E G SL+TR T+D + 
Sbjct: 68 QWCNIGAVFYGARTAAALGRDVRGAVFDRVQSFSAREVGHFGAPSLITRTTNDVQQVQM 127 

Query: 126 FAEMSLRLGLVTPMVMIFSWMILITSPSLAWLVAVAMPLLVGVILYVAIKTKPLSERQQ 185 

A M+ L + P++ + +VM L L+ ++ +P+L + + K +PL + Q 

Sbjct: 128 IALMTFTL^WSAPIMCVGGIVMALGLDVPLSGVLLGWPVIAICOTLIVRKLRPI 1 FRKNQ 187 

Query: 186 TMLDKINQYWENLTGLRWRAFARENFQSQKFQVANQRYTDTSTGLFKLTGLTEPLFVQ 245 

LD +N+ +RE +TG RV+RAF R+ ++ Q+F+ AN T+ + G L L P+ + 
Sbjct: 188 WLDTVNRVLREQITG^VIRAFWDEYEQQRFRKANTELTEVALGTGNLIALMFPVVMT 247 

Query: 246 IIIAMIVAIVWFALDPLQRGAIKIGDLVAFIEYSFHALFSFLLFANLFTMYPRMWSSHR 305 

++ +A+VWF + G ++IGDL AF+ Y + S ++ +F M PR V + R 
Sbjct: 248 VVmsSIAVWFGAHRIDSGGMQlGDLTAFIAYLMQIVMSVMMATFMFM^PRAEVCAER 307 

Query: 306 IREVMDMPISINPNTEGVTDTKLKGHLEFDNVTFAYPGETESPVLHDISFKAKPGETIAF 365 

I+EV++ S+ P VT+ + GHLE F YPG E PVL I A+PGET A 

Sbjct: 308 IQEVLETESSWPPVAPVTELRRHGHLEIREAGFRYPG-AEEPVLRHIDLVARPGETTAV 366 

Query: 366 IGSTGSGKSSLVNLIPRFYDVTLGK1LVDGVDVRDYNLKSLRQKIGFIPQKALLFTGTIG 425 

IGSTGSGKS+L+ L+PR +D T G++LV+GVDVR + K+L + + +PQK LF GT+ 
Sbjct: 367 IGSTGSGKSTLLGLVPRLFDATDGEVLVNGVDVRTVDPKTLAKWSLVPQKPYLFAGTVA 426" 

Query: 426 ENLKYGKADATIDDLRQAVDISQAKEFIESHQEAFETHLAEGGSNLSGGQKQRLSIARAV 485 

NL+YG DAT 44L A4- ++QAKEF+ + + +A+GG+N+SGGQ+QRL+IAR + 
Sbjct: 427 TNLRYGNPDATDEELWHAl^VAQAKEFVSELEGGLDAPIAQGGrNVSGGQRQRLAIARTL 486 

Query: 486 VKDPDLYIFDDSFSALDYKTDATLRARLKEVTGDSTVLIVAQRVGTIMDADQIIVLDEGE 545 

V4- P++Y+FDDSFSALDY TDA LRA L 4- T 4-4-TV4-IVAQRV TI DAD+I+VLDEG 
Sbjct: 487 VQRPEIYLFDDSFSALDYATDAALRAELAQETAEATWIVAQRVATIRDADRIWLDEGR 546 

Query: 546 IVGRGTHAQLIENNAIYREIAESQL 570 

4-VG G H 4-L4- 4-N YREI SQL 
Sbjct: 547 WGVGRHHELMADNETYRE IVLSQL 571 

A related DNA sequence was identified in S.pyogenes <SEQ ID 4985> which encodes the amino acid 
sequence <SEQ ID 4986>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-16.24 Transmembrane 155 - 171 ( 145 - : 
?.48 Transmembrane 130 - 146 ( 122 - : 
5 . 04 Transmembrane 13 - 29 ( 12 - 



INTEGRAL Likelihood = -' 

INTEGRAL Likelihood = -i 

INTEGRAL Likelihood = -I 

INTEGRAL Likelihood = -4 

INTEGRAL Likelihood = -1 

Final Results 



04 Transmembrane 
14 Transmembrane 
70 Transmembrane 
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bacterial membrane Certainty=0 . 7496 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow. 

Identities = 175/511 (34%) , Positives = 296/511 (57%) , Gaps = 3/511 (0%) 

Query: 59 MFIFvvLGIIGRIT^YASSRLTTTMIRDMRNDMYAKLQEYSHHEYEQIGVSSLVTRMTS 118 

+ I +LG++ ++++ + DMR + K+Q++S+ E +LV R+T+ 

Sbjct: 56 LLIIALLGLMSGAINTVLRAKIAQGVSADmEKTFRKIQDFSYAl'IIEAFNAGNLVVRLTN 115 

Query: 119 OTFvTMQFAEMSLRrOSLWPMVMIPSVvMILITSPSIAWLVAVftMPLLVGvTLYVAIKTK 178 

D + M ++ P++ I + +M + T P L W++ V + L+ ++ V + 

Sbjct: 116 DINQlQSLvMMMFQILFRLPILFIGAFIMAVQTFPQLMWVIVUMVILIALIMGLVMRQMG 175 

Query: 179 PLSERQQTMLDKINQYVRENLTGLRWRAFARENFQSQKFQVANQRYTDTSTGLFKLTGL 238 

P + Q ++DKIN+ +ENL G+RW++F +E Q KF+ + + + h 

Sbjct: 176 PRFGKFQRLMDKINRIAKEmRGWWKSFVQEQ«2YTKFKETSNDLIJUiNLSIGYGFSL 235 

Query: 239 TEPLFVQIIIAMIVAIVWFALDPLQRGAIKIGDLVAFIEYSFHALFSFLLFANLFTMYPR 298 

+P + + + + ++ IG++ +F+ Y +FS ++ ++ R 

Sbjct: 236 MQPALMLVSYLAVYVS INWSTMVETDPTVIGHIASFMTYMMQIMFS 1 1 WGSMGMQVSR 295 

Query: 299 MWSSHRIREVMDMPISINPNTEGVTDTKLKGHLEFDNVTFAYPGETESPVXHDISFKAK 358 

VS RIR+++ +4 E + + G + FD+V+F YP + E P L ISF + 
Sbjct: 296 AFVSMARIRQILSTEPAMTFENE--KEETISGSIVFDDVSFTYPNDDE-PTLKHISFAIE 352 

Query: 359 PGETIAFIGSTGSGKSSLWLIPRFYDVTLGKILVDGVDVRDYNLKSLRQKIGFIPQKAL 418 

PG+ + +G+TGSGKS+L LIPR +D G+IL+ G ++ + +LRQ + + QKA+ 
Sbjct: 353 PGQMVGIVGATGSGKSTLAQLIPRLFDPQDGQILLGGKPIKTIjSQTTLRQSVSIVLQKAI 412 



Query: 479 LSIARAWKDPDLYIFDDSFSALDYKTDATLRARLKEVTGDSTVLIVAQRVGTIMDADQI 538 

LSIAR V+ P + I DDS SALD K++ ++ L +T +IVAQ++ +++ AD+I 

Sbjct: 473 LSIARGVINHPKILILDDSTSALDAKSEKRVQEALSHKLEGTTTVIVAQKISSWKADKI 532 

Query: 539 IVLDEGEIVGRGTHAQLIENNAIYREIAESQ 569 

+VLD+G+++G GTHA+L+ NNAIYREI E+Q 
Sbjct: 533 LVLDQGQLIGEGTHAELVANNAIYREIYETQ 563 

There is also homology to SEQ IDs 72 and 6552. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1946 

A DNA sequence (GBSx2055) was identified in S.agalactiae <SEQ ID 6029> which encodes the amino 
acid sequence <SEQ ID 6030>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0, 23 91 (Affirmative) < s 

bacterial membrane Certainty=0 .0000 (Not Clear) < sue 

bacterial outside Certainty=0. 0000 (Not Clear) < sue 

The protein has homology with the following sequences in the GENPEPT databas 

>GP:CAA51784 GB:X73368 ORF 18.3 [Salmonella typhimurium] 
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. Identities = 58/1S2 (35%)/ Positives = 92/162 (55%), Gaps = 8/162 (4%), 

Query: 1 MIIRPIIKNDDQAVAQLIRQSLRAYDL--DKPDTAYSDPHLDHLTSYYEKIEKSGFFVIE 58 

+ +R I D+ A+A++IRQ Y L DK T +DP+LD h ' Y + + ++V+E 
Sbjct: 9 LTVRRITTADNAA1ARVIRQVSAEYGLTADKGYTV-ADPNLDELYQVYSQ-PGAAYWWE 65 

Query: 59 ERDEIIGCGGFGPLKNL IAEMQKVYIAERFRGKGLATDLVKMIEVEARKIGYRQLYL 115 

+ ++G GG PL I E+QK+Y RG+GLA L M AR+ G+++ YL 

Sbjct: 67 QNGCWGGGGVAPLSCSEPDICELQKl^FLPVIRGQGLAKKLALMALDHAREQGFKRCYL 126 

Query: 116 ETASTLSRATAVYKHMGYCALSQPIANDQGHTAMDIWMIKDL 157 

ET + L A A+Y+ +G+ +S+P+ GH ++ M+KDL 
Sbjct: 127 ETTAFLREAIALYERLGFEHISEPL-GCTGHVDCEVRMLKDL 167 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1947 

A DNA sequence (GBSx2056) was identified in S.agalactiae <SEQ ID 6031> which encodes the amino 
acid sequence <SEQ ID 6032>. This protein is predicted to be ABC transporter. Analysis of this protein 
sequence reveals the following: 

I-terminal signal sequence 



25 : Final Results 

bacterial cytoplasm Certainty=0 . 1738 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12566 GB:Z99108 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 269/625 (43%), Positives = 397/625 (63%), Gaps - 11/625 (1%) 

MSDFLVDGLTKSVGDKTVFSNVSFIIHSIjDRIGIIGVNGTGKTTLLDVISGELGFDGDRS 60 
MS + L K+ GDKT+F 4+SF I +RIG+IG NGTGK+TLL VI+G + 
MSILKAENLYKTYGDKTLFDHISFHIEENERIGLIGPNGTGKSTLLKVIAGLESIE— EG 58 



Query: 


1 


Sbj ct : 


1 




61 


Sbjct: 


59 




116 


Sbj ct : 


119 




176 


Sbj ct : 


179 




236 


Sbjct: 


239 




296 


Sbjct: 


299 




356 



- +L Q+P+ QT+L+ + S + M ++EYE L 



KTVLSKLG+ D+ V ELSGG ++RV +A+ L+ AD 



V+ +THDRYFL+ V RI+EL++ + Y+G 



K++ L ++ELAW+R +AR+TKQ+ARI+R + LK 



¥ R+GK+VI ENV +Y + ++ FN L4 +RIGI+G NG+GK 



PD G+++IG+T+R+GY++Q M+G +VI+Y++E A+ VKT+ C 
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Sbjct: 358 TTLLNALaGRHTPDGGDITIGQTVRIGYYTQDHSEMNGELKVIDYIKETAEVVKTADGDM 417 

Query: 41S SVTE-LLEQFLFPRSTHGTQIAKLSGGEKKRLYLLKILIEKPNVLLLDEPTNDLDIATLT 474 

E +LE+FLFPRS T I KLSGGEK+RLYLL++L+++PNVL LDEPTNDLD TL+ 
Sbjct: 418 ITAEQMLERFLFPRSMQQTYIRKLSGGEKRRLYLLQVLMQEPNVLFLDEPTNDLDTETLS 477 

Query: 475 VLENFLQGFGGPVITVSHDRYFLDKVANKI IAFEDND - IREFFGNYTDYLDEKAFNEQNN 533 
VLE+++ F G VITVSHDRYFLD+V ++4-I FE N I F G+Y+DY++E + 

Query: 534 EVISKKESTKTSREKQSRKRMSYFEKQEWATIEDDIMILENTITRIEMDMQTCGSDFTRL 593 

+ + +E T + K+ RK++SY ++ EW IED I LE ++E D+ GSDF +4- 
Sbjct: 538 KP-AAEEKTAEAEPKKICRKKLSYKDQLEWDGIEDKIAQLEEKHEQLEADIAAAGSDFGKI 596 

Query: 594 SDLQKELDAKNEALLEKYDRYEYLS 618 

+L E E L DR+ LS 

Sbjct: 597 QELMAEQAKTAEELEAAMDRWTELS 621 

A related DNA sequence was identified in S.pyogenes <SEQ ID 603 3> which encodes the amino acid 
sequence <SEQ ID 6034>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=C . 2591 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GDS proteins is shown below. 

Identities = 467/624 (74%), Positives = 535/624 (84%), Gaps = 3/624 (0%) 
Sbjct: 1 

Query: 61 PFSSANDYKIAYLKQEPDFDDSQTILDTVLSSDLREMALIKEYELLLNHYEESKQSRLEK 120 

PFS ANDYKIAYL Q+P+F+D+ ++LDTVLS+D++ + LI++YELL+ +Y E KQ LE 
Sbjct: 61 PFSKANDYKIAYLTQDPEFNDAASVLDTVLSADVKAIQLIRQYELLMANYTEDKQESLES 120 

Query: 121 VMAEMDSLDAWSIESEVKTVLSKLGITDLQLSVGELSGGLRRRVQLAQVLLNDADLLLljD 180 

+M+EMD LDAWSIES+VKTVLSKLGITDIrt- VG+LSGG+RRRVQLAQVLL ADLLLLD 
Sbjct: 121 LMSEMDRLDAWSIESDVKTVLSKLGITDLEQKVGDLSGGMRRRVQLAQVLLGAADLLLLD 180 

Query: 181 EPTNHLDIDTIAWLTNFLKNSKKTVLFITHDRYFLDNVATRIFELDKAQITEYQGNYQDY 240 

EPTNHLDIDTIAWLT +LK +KKTVLFITHDRYFLD+VATRIFELDKA +TEYQGNYQDY 
Sbjct: 181 EPTNHLDIDTIAWLTTYLKTAKKTVLFITHDRYFLDHVATRIFELDKAGLTEYQGNYQDY 240 

Query: 241 VRLRAEQDERDAASLHKKKQLYKQELAWMRTQPQARATKQQARINRFQNLKNDLHQTSDT 300 

VRL+AEQDERDAA+LHKKKQLYKQELAWMRTQPQARATKQQARINRF +LK ++HQ S 
Sbjct: 241 TOLKAEQDERDAANLHKKKQLYKQELAWMRTQPQARATKQQARINRFSDLKKEVHQDSSA 300 

Query: 301 SDLEMTFETSRIGKKVINFENVSFSYPDKSIIiKDFNLLIQNKDRIGIVGDNGVGKSTLLN 360 

LEMTFETSRIGKKVI+FE++SF+Y D+ ++KDFNL+IQNKDRIGIVGDNGVGKSTLIjN 
Sbjct: 301 DKLEMTFETSRIGKKVIHFEDLSFAYGDRQLIKDFNLIIQNKDRIGIVGDNGVGKSTLLN 360 

Query: 361 LIVQDLQPDSGNVSIGETIRVGYFSQQLHNIClGSKRVINYLQEVADEvKTSVGTTSVTEL 420 

+ 1 DL+P SG + IG+TTRVGYFSQQL +MD +KRVINYLQEVADEVKTSVGTTS++EL 
Sbjct: 361 IINGDLKPTSGKLDIGDTIRVGYFSQQLKDMDETKRVINYLQEVADEVKTSVGTTSISEL 420 

Query: 421 LEQFliFPRSTHGTQIAKLSGGEKKRLYLLKILIEKPNvLLLDEPlTOLDIATLTVLENFL 480 

LEQFLFPRS+HGT IAKLSGGEKK?LYLLK+I,IEKPNVLLLDEPTNDLDIATL VLENFL 
Sbjct: 421 LEQFLFPRSSHGTLIAKLSGGEKKRLYLLKLLIEKPNVLLLDEPINDLDIATLKVLENFL 480 



Query: 481 QGFGGPVITVSHDRYFLDKVANKIIAFEDNDIREFFGNYTDYLDEKAFNEQNNEVISKKE 540 
65 F GPVITVSHDRYFLDKVA KI+AFE+ DIR F+GNY+DYLDEK F ++ E K 
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Sbjct: 481 ANFAGPVITVSHDRYFLDKVATKILAFEEGDIRVFYGI^YSDYLDEKVFEKETVEADLAKT 540 

Query: 541 STKTS REKQSRKRMSYFEKQEWATIEDDIMILENTITRIEKDMQTCGSDFTRLSDLQ 597 

+ +K+ RKRMSY EKQEWA I EE 

Sbjct: 541 TVTEEVPLPQKEERKRMSYLEKQEWAQIEE 

Query: 598 KELDAKNEALLEKYDRYEYLSELD 621 

KELD +N LL Y+R+EYLS LD 
Sbjct: 601 KELDQRNNDLLLAYERFEYLSGLD 624 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1948 

A DNA sequence (GBSx2057) was identified in S.agalactiae <SEQ ID 6035> which encodes the amino 
acid sequence <SEQ ID 6036>. This protein is predicted to be poly(a) polymerase (papS). Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2658 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9957> which encodes amino acid sequence <SEQ ID 9958> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB38446 GB:L47709 poly(A) polymerase [Bacillus subtilis] 
Identities = 157/395 (39%), Positives = 235/395 (58%), Gaps = 14/395 (3%) 

FQKALPILKKIKKAGYEAYFVGGSVRDVLLDRPIHDVDIATSSYPEETKQIFKRTVDVGI 70 
F KALP+L+ + +AG++AYFVGG+VRD + R I DVDIAT + P4+ +++F+RTVDVG 
FIKALPVLRILIEAGHQAYFVGGAVRDSYMKRTIGDVDIATDAAPDQVERLFQRTVDVGK 64 



+RF EDALR++R +RF + 





11 


Sbjct: 


5 


Query: 


71 


Sbjct: 


65 


Query: 


131 


Sbjct: 


125 




191 


Sbjct: 


185 


Query: 


251 


Sbjct: 


242 




308 


Sbjct: 


298 




364 


Sbjct: 


358 



SLL +SVER IEF+KLL 



TS E+ WA+L++++ 4 



IE A+V G+L N+++ I +-+K 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6037> which encodes the amino acid 
sequence <SEQ ID 6038>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0 .2023 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 256/400 (64%), Positives = 312/400 (78%) 

Query: 2 MRLNYLPSEFQKALPILKKIKKAGYEAYEVGGSVRDVLLDRPIHDVDIATSSYPKETKQI 61 

M+L +PSEFQKALPIL KIK+AGYEAYFVGGSVRDVLL+RPIHDVDIATSSYPEETK I 
Sbjct: 1 MKLMTMPSEFQKALPILTKIKEAGYEAYFVGGSVRDVLLERPIHDVDIATSSYPEETKAI 60 

Query: 62 FKETVTDVGIEHGTVLVLEKGGEYEITTFRTEEVYVDYRRPSQVNFVRSLEEDLKRRDFTV 121 

F RTVDVGIEHGTVLVLE GGEYE ITTFRTE+ + YVDYRRPSQV+ FVRSLEEDLKRRDFTV 
Sbjct: 61 FNRTVDVGIEHGTVLVLENGGEYEITTFRTEDIYVDYRRPSQVSFVRSLEEDLKRRDFTV 120 

Query: 122 NAFALNEDGEVIDLFHGLDDLDNHLLRAVGIASERFNEDALRIMRGLRFSASMFDIETT 181 

NA AL+E+G+VID F GL DL LRAVG A ERF EDALRIMRG RF+ASL+FDIE 
Sbjct: 121 NALALDENGQVIDKFRGLIDLKQKRLRAVGKAEERFEEDALRIMRGFRFAASLDFDIEAI 180 

Query: 182 TFEAMKKHASLLEKISVERSFIEFDKLLLAPYKRKGMLALIDSHAFNYLPCLKNRELQLS 241 

TFEAM+ H+ LLEKISVERSF EFDKLL+AP+WRKG+ A+I A++YLP LK +E L+ 
Sbjct: 181 TFEAMRSHSPLLEKISvERSFTEFDKLLMAPHWRKGISAMlACQAYDYLPGLKQQEAGIiN 240 

Query: 242 AFLSQLDKDFLFETSEQAWASLILSMEVEHTKTFLKKWKTSTHFQKDVEHIVDVYRIREQ 301 

+ L +F F QAWA +++S+ +E K+FLK WKTS FQ+ V ++ +YRIR++ 
Sbjct: 241 HL1VSDKDNFTFSDYHQAWAYVMISLAIEDPKSFLKAWKTSNDFQRYVTKLIALYRIRQE 300 

Query: 302 MGLTKEHLYRYGKT 1 1 KQAEGIRKARGLMVDFEKIEQLDSELAIHDRHE I WNGGTLI KK 361 

K +Y+YGK + E +RKA+ L VD ++I LD L IHD+H+IV+NG LIK 
Sbjct: 301 RSFEKLDIYQYGraomSLVEDLRKAQSLSVDMDRIOTLDQALVIHDKHDIVlNGSHLIKD 360 

G4K GPQ+G ++■ ++ELAIV G+L N+ I FV++ L 
Sbjct: 361 FGMKSGPQLGLMLEKVELAIVEGRLDNDFTTIEAFVREEL 4 00 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1949 

A DNA sequence (GBSx2058) was identified in S.agalactiae <SEQ ID 6039> which encodes the amino 
acid sequence <SEQ ID 6040>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



50 Final Results 

bacterial cytoplasm --- Certainty=0 .2939 (Affirmative) < succ 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07346 GB:AP001519 unknown conserved protein [Bacillus halodurans] 
Identities = 94/274 (34%) , Positives = 153/274 (55%) , Gaps = 2/274 (0%) 



Query: 2 KLALITDTSAYLPEAIENHEDVYVLDIPIIIDGKTYIEGQNLTLDQYYDKLAASKELPKT 61 
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K+A++TD+ +AYL V V+ + ++ + Y E L+ +Y+KL ++LP T 

Sbjct: 3 • KIAIVTDSTAYLGPKRAKELGVIWPLSWFGEEAYQE3VELSSADFYEKLKHEEKLPTT 62 

Query: 62 SQPSLAELDDLLCQLEKEGYTff/LGLFIAAGISGFWQNIQFDIEEHPNLTIAFPDTKITS 121 

SQP++ + +L KEG+ V+ + +++ ISG +Q+ + + D+ I + ' 

Sbjct: 63 SQPAVGLFVETFERLAKEGFEWI SIHLSSKI SGTYQSALTAGSMVEGIEVIGYDSGI SC 122 

Query: 122 APQGNLVRWALMCSREGMDFDVIVNKIQSQIEKIEGFIWNDLNHLVKGGRLSNGSAIIG 181 

PQ N V A +EG D I++ + ++ W+DL+HL +GGRL+ ++G 

Sbjct: 123 EPQANFVAEAAKLVKEGADPQTIIDHLDEVKKRTlilALFWHDLSHLHRGGRLNARQLVVG 182 

Query: 132 NLLS I KPVLHFNEEGKI VVYEKVRTE KKALKRLAE I - VKEMTADGEYD I AI IHSRAQDKA 240 

+LL IKP+LHF E+G IV EKVRTEKKA R+ E+ +E ++ +IH+ D A 

Sbjct: 133 SLLKIKPILHF-EDGSIVPLEKVRTEKKAWARVKELFAEEASSASSVKATVIHANRLDGA 241 

Query: 241 EQLYHLLAKAGLKDDLEIVSFGGVIATHIiGEGAV 274 

E+L 4 + D+ I FG VI THLGEG++ 

Sbjct: 242 EKLADEIRSQFSHVDVS I SHFGPVIGTHLGEGSI 275 

A related DNA sequence was identified in S.pyogenes <SEQ ID 604 1> which encodes the amino acid 
sequence <SEQ ID 6042>. Analysis of this protein sequence reveals the following: 

10 N- terminal signal sequence 

25 Final Results 

bacterial cytoplasm --- Certainty=0. 3379 (Affirmative) <; suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) <: suco 

30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 181/281 (64%) , Positives = 233/281 (82%) 

MKIJ^ITDTSAYLPEAIENHEDVYVU)IPIIIDGKTYIEGQNLTLDQYYDKLARSKELPK 60 
MKIA+ITD++A LP ++ + ++ LDIP+IID +TY EG+NL++D +Y K+A S+ LPK 
MKLAVITDSTATLPTDLKQDEOilFSLDIPVIIDDETYFEGENLSIDDFYQKMADSQNLPK 60 

TSQPSLAELDDLLCQLEKEGYTHVLGLFIA&GISGFWQNIQFLIEEHPNLTIAFPDTKIT 120 
TSQPSL+ELD+LL L +GYTHV+GLF+A GISGFWQNIQFL EEHP + +AFPD+KIT 



SAP G++V+N L SR+GM F I+NK+Q QI+ FI+V+DLNHLVKGGRLSNGSA++ 



GNLLSIKP+L F+EEGKIWYEKVRTEKKA+KRL EI+ ++ ADG+Y++ IIHS+AQDKA 



- L LL +G + D+E V FG VIATHLGEGA+AFG+TP+ 
JYLKRLLQDSGYQYDIEEVHFGAVIATHLGEGAIAFGVTPR 281 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1950 

A DNA sequence (GBSx2059) was identified in S.agalactiae <SEQ ID 6043> which encodes the amino 
acid sequence <SEQ ID 6044>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.59 Transmembrane 51 - 67 ( 50 - 67) 
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Final Results 

bacterial membrane Certainty=0. 1638 (Affirmative) < suco 

bacterial outside. Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6045> which encodes the amino acid 

sequence <SEQ ID 6046>. Analysis of this protein sequence reveals the following: 

10 Possible site: 61 

>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -3.19 Transmembrane 50 - 66 ( 49 - 67) 

15 Final Results 

bacterial membrane Certainty=0 . 2275 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has no significant homology with any sequences in the GENPEPT database. 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 94/126 (74%) , Positives = 115/126 (90%) 

Query: 1 MEVIREQEFWQYHYDARNLEWEEENGTPKTNFEWFQIjUJRDEAAKVTSIVAVLQFVIV 60 
25 M+++RE+EFVNQYHYDARNLEWE+ENGTP+TNFEVTFQL ++DE K T IV+VLQFVIV 

Sbjct: 1 MQLTOEKEFWQyHYDARNLEWEKENGTPETNFEVTFQLIDKDEQQKETVIVSVLQFVIV 60 

Query: 61 RDEFVISGVISQMAHIQGRLINEPSEFSQDEVENLAAPLLEIVKRLTYEVTEIALDRPGV 120 
++EFVISGVISQM I RL+++PSEF+Q+EvE+IAAPLL++VKRLTYEVTEIALDRPG+ 
30 Sbjct: 61 KEEFVISGVISQMWILDRLTOKPSEFTQEEvESLAAPLLDMVKRLTYEVTEIALDRPGI 120 

Query: 121 tlefns 12 6 

LEF + 

Sbjct: 121 HLEFKN 126 

35 

SEQ ID 6044 (GBS416) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 79 (lane 4; MW 17.5kDa). 

GBS416-His was purified as shown in Figure 214, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 1951 

A DNA sequence (GBSx2060) was identified in S.agalactiae <SEQ ID 6047> which encodes the amino 
acid sequence <SEQ ID 6048>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
45 >» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3875 (Affirmative) < suco 

bacterial membrane — Certainty=0.0000 (Not Clear) < suco 

50 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1952 

A DNA sequence (GBSx2061) was identified in S.agalactiae <SEQ ID 6049> which encodes the amino 
5 acid sequence <SEQ ID 6050>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have an uncleavable W-term signal seq 

Final Results 

10 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 1953 

A DNA sequence (GBSx2062) was identified in S.agalactiae <SEQ ID 605 1> which encodes the amino 
20 acid sequence <SEQ ID 6052>. This protein is predicted to be PTS system, fructose-specific enzyme II, 
BC component (fruA-1). Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =-] 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 



630 - 646 ( 61S - 653 

307 - 323 ( 303 - 331 

415 - 431 ( 412 - 

448 - 464 ( 444 - 

595 - 611 ( 591 - 

530 - 546 ( 529 - 

Transmembrane 350 - 366 ( 350 - 

Transmembrane 486 - 502 ( 486 - 

Transmembrane 376 - 392 ( 376 - 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 5225 (Affirmative) < 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 9959> which encodes amino acid sequence <SEQ ID 9960> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04547 GB:AP001510 PTS system, fructose-specific enzyme II, BC 
component [Bacillus halodurans] 
Identities = 320/659 (48%), Positives = 438/659 (65%), Gaps = 46/659 (6%) 

Query: 1 MKIQDLLKKEVMI^LKATSKEAAIDMITKLVDTGVVTNFAIFKDGIMKREAQTSTGLG 60 

+KI +LLKK+ M+++L+A SKEA IDE++ L G + + FK I++RE+Q++TG+G 
Sbjct: 2 LKISELLKKDTMVINLRAASKEAVIDELVRTLDKAGRLNDAQAFKRAILERESQSTTGVG 61 

Query: 61 DGIAMPHSKNAAVKEATVLFAKSASGVDYEAIJX^^ 120 

+GIA+PH+K AAVK+ + F +S +G+DYE+LDGQP+ LFFMIAA +GAN+ HL L+ L 
Sbjct: 62 EGIAIPHAKTAAVKQPAIAFGRSDAGIDVESLDGQPSHLFFMIAASEGANNEHLETLSRl, 121 
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Query: 121 SKYLLKEGFADQLRQAKTPDDIIATFDSNSISQETVAPQTVQSTSKGSDYIVAVTACTTG 180 

S +L+ E F L +A++ D+l+A D +E + +G + ++AVT C TG 

Sbjct: 122 STFLMDETFRSTLMKAQSEDEILAAID KKEAETAGEAEEKQEGYE - LLAVTGCPTG 176 

Query: 181 IAHTYMAEEALKJUCAAEMGVGIKVETWGASGVGNKLTSSDIARAKGVIIAADKAVEMDRF 240 

IAHTYMA + LK KA E+GV IKVETNG+ GV N+LT +1+ AK + I+AAD VEMDRF 
Sbjct: 177 IAHTYMAADNLKSKAQELGVSIKVETNGSGGVKNRLTDEEISAAKAIIVAADTICVEMDRF 236 

Query: 241 

GKP++ PV DGI++ ++LI+ 
Sbjct: 237 HGKPV1 QVPVTDGI RRPKELI DQALAGKAPVY ] 



Query: 298 HLMGGVSQMLPFVIGGGIMIAIAFLFDNILGVPKDQLSNLGSYHEIAALFKNIGGA-AFA 356 

HLM GVS MLPFV+GGGI + IAI+F+F P D SYH A + IGG AF 

Sbjct: 293 HLMNGVSNMLPFWGGGILIAISFMFGIKAFDPSDP SYHPFAEMLMTIGGGNAFG 347 



Query: 357 FMLPVLAGYIAYSIAEKPGLVAGFVAGSIASSGLAFGKVPFAEGGKATLALAGVPSGFLG 416 

M+PVLA +IA SIA++PG AG + G IAS+G A GFLG 
Sbjct: 348 LMIPVLAAFIAMSIADRPGFAAGMIGGLIASTGEA GFLG 386 

Query: 417 ALVGGFLAGGVILLLRKLLSGLPKSLEGIKSILLYPLLGVLITGFLMLLVNIPMAAINTA 476 

L+ GFLAG V L ++K+L+ LP++L4GIK+IL YP+ + ITG +ML++ P+AA NT 
Sbjct: 387 GLIAGFIAGWALGVKKVIJ^PQTLDGIKTILFYPVFNIFITGMIMLVIVGPI^FNTG 446 

Query: 477 LOTFLQGLSGSSAVLMGLLVGGMMAVDMGGPWKAAYVFGTGTLAATVANGGSVVMAAVM 536 

L +L + ++ V++G+++GGMMAVDMGGP+NKAA+ FG + A G AAVM 
Sbjct: 447 LQDWLGSMGTANMVI LGVI LGGMMAVDMGGP INKAAFTFGIAMIDA GNFGPHAAVM 502 

Query: 537 AGGWPPIAVWATLLFKDKFNNEERQSGIjTNIVMGLSFITEGAIPFGAADPARAIPSFI 596 

AGGMVPPL + +AT LFK KF +ER++G TN ++G SFITEGAIPF AADP R IPS I 
Sbjct: 503 AGGMVPPLGIA1ATTLFKKKFTKQEREAGKTNYILGASFITEGAIPFAAADPGRVIPSI1 562 

Query: 5S7 VGSALTGALVGLAGIKLMAPHGGIFVI-— ALTSNPLLYILFILIGAWSGVLFGLFRK 552 

VGSA G h L + L APHGG FVI + +NPLLY++ 1+ G++V+ +L G ++K 
Sbjct: 563 VGSAFAGGLTALFNVTLSAPHGGAFVIFIGNIVNNPLLYLVAI IAGSIVTALLLGFWKK 621 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6053> which encodes the amino acid 
sequence <SEQ ID 6054>. Analysis of this protein sequence reveals the following: 

40 Possible site: 18 



50 





have no N-terminal signal sequence 










INTEGRAL 


Likelihood =- 




77 


Transmembrane 


624 


640 


612 


646 


INTEGRAL 


Likelihood = 


-7 


59 


Transmembrane 


301 




297 


321 


INTEGRAL 


Likelihood = 


-s 




Transmembrane 


442 


45B 


439 




INTEGRAL 


Likelihood = 


-5 


95 


Transmembrane 


409 


425 


406 


426 


INTEGRAL 


Likelihood = 


-3 


61 


Transmembrane 


524 


540 


523 


547 


INTEGRAL 


Likelihood = 


-2 


50 


Transmembrane 


337 


353 


337 


353 


INTEGRAL 


Likelihood = 


-2 




Transmembrane 


589 


605 


589 


605 


INTEGRAL 


Likelihood = 


-1 




Transmembrane 


480 


496 


480 




INTEGRAL 


Likelihood = 






Transmembrane 


370 


386 




386 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 5310 (Affirmative) 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 

>GP:BAB04547 GB:AP001510 PTS system, f ructose -specif ic enzyme II, BC 
component [Bacillus halodurans] 
60 Identities = 322/659 (48%) , Positives = 431/659 (64%) , Gaps = 48/659 (7%) 



MKIQDLLRKDIMILDLQAISKEVAIDEMITKLVEKDITODFDVFKKSIMTREEQTSTGLG 60 
+KI +LL+KD M+L+L+A SKE IDE++ L + ++D FK++I+ RE Q++TG+G 
LKISELLKICDTMVLNLRAASKEAVIDELWTI^KAGRLNDAQAFKRAILERESQSTTGVG 61 
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Query: 121 SQYLLKDGFADKLRAAATPEAVIAVFD- -3ASTAKEEWAPTSGQDFIVAVTACPTGIAH 178 

S +L+ + F L A + + ++A D EA TA E + ++AVT CPTGIAH 

Sbjct: 122 STFLMDETFRSTLMKAQSEDEILAAIDKKEAETAGEAEEKQEGYE--LIAVTGCPTGIAH 179 

Query: 179 TYMAEEALKKQAAEMGVAIKVETNGASGVANRLTAEDIQRAKGVIVAADKAVEMDRFDGK 238 

TYMA + LK +A E+GV+IKVETNG+ GV NRLT E+I AK +IVAAD' VEMDRF GK 
Sbjct: 180 TYMAADNLKSKAQELGVSIKVETNGSGGVKNRLTDEEISAAKAIIVAADTKVEMDRFHGK 239 

Query: 239 QFIARPVADGIKKSQEniSLILNNEGNTYHAKNGKSETAVSTEKTSLGG AFYKHL 293 

I PV DGI++ +ELI L+Y + SESGG FYKHL 

Sbjct: 240 PVIQVPVTDGIRRPKELIDQALAGKAPVY EGGAQASGEDGSAGGGRPKLGFYKHL 294 

Query: 294 MGGVSQMLPFVIGGGIMIALAFLLDNMLGVPNDQLGSLGSYHEIAAIFMNIGGA-AFSFM 352 

M GVS MLPFV+GGGI+IA++F+ P+D SYH A + M IGG AF M 

Sbjct: 295 MNGVSNMLPFWGGGILIAISFMFGIKAFDPSDP SYHPFAEMLMTIGGGNAFGLM 349 

Query: 353 LPVLAGYIAYSIAEKPGLVAGFVAGAIASNGIAFGKVPFAAGGEVSLGLTGVPSGFLGAL 412 

+PVLA +IA SIA++PG AG + G IAS G A GFLG L 

Sbjct: 350 IPVLAAFIAMSIADRPGFAAGMIGGLIASTGEA GFLGGL 388 

Query: 413 VGGFIAGGVIIALRia.IAGLPRSLEGVKSILLYPLLGVLVTGFLMLFVNIPMAA.INTALN 472 

+ GFLAG V L ++K+LA LP++L+G+K+IL YP+ + +TG +ML + P+AA NT L 
Sbjct: 389 IAGFIAGWALGVKKVLANLPQTLDGIKTILFYPVFNIFITGMIMLVIVGPLAR.FNTGLQ 448 

Query: 473 DFLQGLSGSSAVLMGL,LVGGt#tAVDMGGPVNKAAWFGTGTLAaTVANGGSVVMAAVMAG 532 

D+L + ++ V++G+++GGMMAVDMGGP+NKAA+ FG + A G AAVMAG 
Sbjct: 449 DWLGSMGTANMVILGVILGGMMAVDMGGPINKAAFTFGIAMIDA GNFGPHAAVMAG 504 

Query: 533 GMVPPLAVFVATLLFKDKFTKEERESGLTNIVMGLSFITEGAIPFGAADPARAIPSFIAG 592 

GMVPPL + 4AT LFK KFTK+ERE+G TN ++G SFITEGAIPF AADP R IPS 1 G 
Sbjct: 505 GWPPLGIAIATTLFKKKFTKQEREAGKTNYILGASFITEGAIPFAAADPGRVIPSIIVG 5S4 

Query: 593 SALTGALVGLAGIKLMAPHGGIFVI ALTSNPILYLVFWIGALVSGILFGALRKKA 548 

SA G L L + L APHGG FVI + +NP+LYLV ++ G++V+ +L G +K A 
Sbjct: 565 SAFAGGLTALFNVTLSAPHGGAFvT FIGNIVNNPLLYLVAI IAGSIVTALLLGFWKKDA 623 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 526/652 (80%), Positives = 581/652 (88%), Gaps = 6/552 (0%) 

Query: 1 MKIQDLLKKEVMIMDLKATSKEAAIDEMITKDVDTGVVTNFAIFKDGIMKREAQTSTGLG 60 

MKIQDLL+K++MI +DD+A SKE AIDEMITKLV+ +V +F +FK IM RE QTSTGLG 
Sbjct: 1 MKIQDLLRKDIMILDLQAISKEVAIDEMITKLVEKDIVHDFDVFKKSIMTREEQTSTGLG 60 

Query: 61 DGIAMPHSKNARVKEATVLFAKSASGVDYKALEGQPTDLFFMIAAPDGAWJTHIiAALAEL 120 

DGIAMPHSKN V + VLFAKS GVDY+ALDGQPTDLFFMIAAP GANDTHLAALAEL 
Sb j ct : 61 DGIAMPHSKNI VVDKPAVLFAKSNKGVDYKAUDGQPTDLFFMIAaPQGAHDTHIiARLAEL 120 

Query: 121 SKYLiLKEGFADQLRQAKTPDDIIATFDSNSISQETVAPQTVQSTSKGSDYIVAVTACTTG 180 

S+YLLK+GFAD+LR A TP+ +IA FD S ++E V T G D+IVAVTAC TG 

Sbjct: 121 SQYLLKDGFADKLRAAATPEAVIAVFDEA8TAKEEWAPT SGQDFIVAVTACPTG 175 

Query: 181 IAHTYMAEEMjKKKAAEMGVGIKVETNGASGVGKKLTSSDIARAKGVI IAADKAVEMDRF 240 

IAHTYMAEEALKK+AAEMGV IKVETNGASGV N+LT+ DI RAKGVI +AADKA VEMDRF 
Sbjct: 176 IAHTYMAEEALKKQAAEMGVAIKVETHGASGVANRLTAEDIQRAKGVIVAADKAVEMDRF 235 

Query: 241 DGKPLVSRPVADGIKKSEDLINIILDNKAQTYHAKNQNDKQSGESDGKSGLGSAFYKHLM 300 
DGK ++RPVADGIKKS++LI++IL+N+ TYHAKN ++ S K+ LG AFYKHLM 

Query: 301 GGVSQMLPFVIGGGIMIAIAFLFDNILGVPfCDQLSNLGSYHEIAALFKNIGGAAFAFMLP 360 

GGVSQMLPFVIGGGIMIA+AFL DN+LGVP DQL +LGSYHEIAA+F NIGGAAF+FMLP 
Sbjct: 295 GGVSQMLPFVIGGGIMIALAFLLDMLGVPNDQLGSLGSYHEIAaiFMNIGGAAFSFMLP 354 
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Query: 361 VLAGYIAYSIAEKPGLVAGFVAG3IASSGLAFC-KVPFAEG3KS.TLALAGVPSGFLQALVG 420 

VLAGYIAYSIAEKPGLVAGFVAG+IAS+GLAFGKVPFA GG+ +L L GVPSGFLGALVG 
Sbjct: 355 VLaGYIAYSIAEKPGLVAGFVAGAIASNGLAFGKVPFAAGGEVSLGLTGVPSGFLGALVG 414 

Query: 421 GFLAGGVI LLLRKLLSGLPKSLEGIKSILIiYPLIiGVIjITGFLMIiLVNI PMAAINTALNTF 480 

GFLAGGVIL LRKLL+GLP+SLEG+KSILLYPLLGVL+TGFLML VNIPMAAINTALN F 
Sbjct: 415 GFLAGGVILALRKLLAGLPRSLEGVKS ILLYPLLGVLVTGFLMLFVNI PMAAINTALNDF 474 

Query: 481 LQGLSGSSAVLMGLLVGG^^VDMGGPWKAAYWGTGTLAATVANGGSVVMAAVMAGGM 540 

LQGLSGSSAVLMGLLVGG^VDMGGPWKAAYWGTGTI^TVANGGSVVM^VMAGGM 
Sbjct: 475 LQGLSGSSAVLMGLLVGGMMAVDMGGPWKAAYWGTGTIjAATVANGGSVVMAAVNAGGM 534 

Query: 541 VPPIAVFVATLLFKDKFNNEERQSGLTNIVI'IGLSFITEGAIPFGAADPARaiPSFIVGSA 600 

VPPLAVFVATLLFKDKF EER+SGLTNIVMGLSFITEGAIPFGAADPARAIPSFI GSA 
Sbjct: 535 VPPLAVFVATLLFKDKFTKEERESGLTNIVMGLSFITEGAIPFGAADPARAIPSFIAGSA 594 

Query: 601 LTGALVGIAGI KLMAPHGGI FVIALTSNPLLY I LF I L IGAWSGVLFGLFRK 652 

LTGALVGLAGIKLMAPHGGI FVIALTSNP+LY++F++IGA+VSG+LFG RK 
Sbjct: 595 LTGALVGLAGIKLMAPHGGIFVIALTSNPILYLVFWIGALVSGILFGALRK 646 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1954 

A DNA sequence (GBSx2063) was identified in S.agalactiae <SEQ ID 6055> which encodes the amino 
acid sequence <SEQ ID 6056>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal 



Final Results 

bacterial cytoplasm — Certainty=0. 1532 (Affirmative) < succ 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MIYTVTLNPSIDFIVRLDTLLLGSVNRMTSDDKYVGGKGINVSRILKRLKIDNTATGFIG 60 

MIYTVTLNPS+D+IV ++ +G +NR + D KY GGKGINVSR+LKR + + A GF+G 
Sbjct: 1 MIYTVTLNPSVDYIVHVEDFTVGGLNRSSYDTKYPGGKGINVSRLLKRHHVASKALGFVG 6 0 

Query: 61 GFTGHFVEDGLVLEGIKTDFVSVNEDTRIN7KVKAKIETEINGGGPRITNEQLHRLEKLL 120 

GFTG +++ L E ++T F V DTRINVK+K ETEING GP I++E + 
Sbjct: 61 GFTGEYIKTFLREENLETAFSEVKGDTRINVKLKTGDETEINGQGPTISDEDFKAFLEQF 120 

Query: 121 SRLTPEDTWFAGSAPASLGNIWYNTIilPIAKKTGAEWCDFEGQTLLDALAYQPLLVKP 180 

h D W AGS P+SL + Y + K+ A W D G+ LL A +P L+KP 
Sbjct: 121 QSLQEGDIWLAGS I PSSLPHDTYEKIAEACKQQNARWIiDI SGEALLKATEMKPFLMKP 180 

Query: 181 NNHEIADIFGVELEGLPDIEKYAHKILDKGAKNVIVSMAGDGALLVTPEASYFAKPIKGE 240 

N+HEL ++FG + + + Y K++++GA++VIVSMAGDGALL T EA YFA KG+ 
Sbjct: 181 NHHELGEMFGTAITSVEFAVPYGKKLVECGAEHVIVSmGDGALLFTNEAVYFANVPKGK 240 

Query: 241 VKNSVGAGDSMVAGFTGEWKSKNPVEALKWGVACGTATTFSDDLATAEFIQDIYNKVEV 3 00 

+ NSVGAGDS+VAGF K EA + GV G+AT FS++L T EF+Q + +V+V 

Sbjct: 241 LVNSVGAGDSWAGFLAGISKQLPLEEAFRLGOTSGSATAFSEELGTEEFVQQLLPEVKV 300 

Query: 301 EKL 303 
+L 

Sbjct: 301 TRL 303 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 605 7> which encodes the amino acid 
sequence <SEQ ID 6058>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1738 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 222/302 (73%) , Positives = 261/302 (85%) 

Query: 1 MIYTVTLNPSIDFIVRLDTLLLGSVNRMTSDDKYVGGKGINVSRILKRLKIDNTATGFIG 60 

MIYTvTLNPSIDFIVR+D + LGSVNRM SDDK+ GGKGINVSRIL+RL I +TATGF+G 
Sbjct: 1 MIYTVTLNPSIDFIVRIDQINLGSVNRMASDDKFAGGKGINVSRILQRLDIASTATGFLG 60 

Query: 61 GFTGHFVEDGLVIjEGIKTDWSulSrEDTRINVKAfKAKIETEINGGGPRITNEQLHRLEKLL 120 

GFTG F+E+ L EG+KTDFV ++DTRINVK+K++ ETE+NG GP 1+ EQL L+ L 
Sbjct: 61 GFTGRFIEESLSAEGVKTDFVKGDQDTRINVKIKSQEETELNGQGPIISQEQLEDLKTKL 120 

Query: 121 SRLTPEDTWFAGSAPASLGNKVYNTLIPIAKKTGAEWCDFEGQTLLDALAYQPLLVKP 180 

S+LT EDTWFAGSAPA+LGN VY L+P+ +++GA+WCDFEGQTL+DALAY PLLVKP 
Sbjct: 121 SQLTAEDTWFAGSAPANLGNAVYKELLPLVRQSGAQWCDFEGQTLIDALAYNPLLVKP 180 

Query: 181 NNHELADIFGvELEGLPDIEKYAHKILDKGAKNVIVSMAGDGALLVTPEASYFAKPIKGE 240 

NNHEL IFG L L D+E YA ++L+ GA+NVI+SMAGDGALLVT EA+YFAKPIKGE 
Sbjct: 181 NNHELEAI FGTI LTSLDDVETYARRLLEMGAQNVI I SMAGDGALL VTKEATYFAKP I KGE 240 

Query: 241 VKNSVGAGDSMVAGFTGEFVKSKNPVEALKWGVACGTATTFSDDLATAEFIQDIYNKVEV 300 

VKNSVGAGDSMVAGFTGEF+KS+NP+EALKWGVACGTAT FSDDLAT FI++ Y+KVEV 
Sbjct: 241 VKNSVGAGDSMVAGFTGEFMKSQNPIEALKWGVACGTATAFSDDLATIAFIKETYHKVEV 300 

Query: 301 EK 302 
EK 

Sbjct: 301 EK 302 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 



40 Example 1955 

A DNA sequence (GBSx2064) was identified in S.agalactiae <SEQ ID 6059> which encodes the amino 
acid sequence <SEQ ID 6060>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

45 

Final Results 

bacterial cytoplasm Certainty=0 .2769 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not , Clear) < suco 

50 

A related GBS nucleic acid sequence <SEQ ID 9961> which encodes amino acid sequence <SEQ ID 9962> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC24913 GB:AF012285 FruR [Bacillus subtilis] 
55 Identities = 97/247 (39%) , Positives = 148/247 (59%) , Gaps = 4/247 (1%) 

Query: 23 MLKSKRKEIILSRLEQNKSVTLDELTSILETSESTORRDLDELESAGFLKRVHGGAELPY 82 
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Sbjct 

Query: 83 SLGQELSNQEKAIKNVQKKLDIARQTAKLIAKQDVIFIDAGTTTELLIDFLPH-EQLTW 141 

+ E EK+ KN+ KL IA + A L+ + D I++DAGTTT +IDF+ + + W 
Sbjct: 61 DIRLEPDMLEKSSKNLHDKLKIAEKAASLLEEGDCIYLDAGTTTLHMIDFMDKTKDIVW 120 

Query: 142 TNSIHHAAKLVDRGIKTIIIGGAVKHSTDASIGQVAINQIRQITVDKAFLGMNGID-EVY 200 

TN + H L+ + I ++GG VKH T A IG ++ + Q DK+FLG NG+ E 
Sbjct: 121 TNGVMHIDALIRICEISFYLLGGYVICHRTGAIIGGASLVAMDQYRFDKSFLGTNGVHTEAG 180 

Query: 201 LTTPDLEEAAIKEAIIKNSQQTFILMDSSKIGQVTFAKVKEINDINLVTNKTDSELMTII 260 

TTPD +EA +K+ 1 ++ ++L D SK G+++F+ I D ++T TD+E +T 
Sbjct: 181 FTTPDPDEALLKQKAIKQAKH&YVLADPSKFGEISFSAFAGIGDATIIT- -TDAEELTFD 238 

Query: 261 KEKMKVI 267 
+ K + 

Sbjct: 239 NYQEKTV 245 

A related DNA sequence was identified in S. pyogenes <SEQ ID 606 1> which encodes the amino acid 
sequence <SEQ ID 6062>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2604 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 135/237 (56%) , Positives = 184/237 (76%) 

Query: 33 LSRLEQNKSVTLDELTSILETSESTVRRDLDELESAGFLKRVHGGAELPYSLGQELSNQE 92 

++++ + V+L++L +L +SEST+RRDL ELE G L RVHGGAEL +SL +ELSNQE 
Sbjct: 1 IV1AKITEENYVSLEDLMQLUJSSESTIRRDLGELEQEGRLHRVHGGAELFHSLQEELSNQE 60 

Query: 93 KAIKNVQKKLDIARQTAKLIAKQDVIFIDAGTTTELLIDFLPHEQLTVVTNSIHHARKLV 152 

K++KN K IA++ ++LI DVIFIDAGTTTE L+ FL + LTWTNS I HHAA+LV 
Sbjct: 61 KSTONSHIlOCAlAQRASQLIYDNDVIFIDAGTTTEFLLPFLQAKNLTvVTNSIHHAARLV 120 

Query: 153 DRGIKTIIIGJGAVTCHSTDASIGQVAINQIRQITVDKAFLGMKGIDEVYLTTPDLEEAAIK 212 

+ I+TII+GG VK +TDASIG VA+ QIRQ+ DKAFLGMNG+D+ YLTTPD+EEA IK 
Sbjct: 121 ELSIETIIVGGYVKQTTDASIGNVALEQIRQMNFDKAFLGMNGVDDSYLTTPDMEEAVIK 180 

Query: 213 EAIimSQQTFILMDSSKIGQVTFAKVKEINDINLVTNKTDSELMTIIKEKMKVIQV 269 

+A+++N++ +IL+D +KIGQV+F KV IND+ ++T + ++ IKEK KVI++ 
Sbjct: 181 KAVLSNAKIAYILVDGTKIGQVSFVKVAPINDWIITLGGSASILKQIKEKAKVIEL 237 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1956 

A DNA sequence (GBSx2065) was identified in S.agalactiae <SEQ ID 6063> which encodes the amino 
acid sequence <SEQ ID 6064>. This protein is predicted to be beta-lactam resistance factor. Analysis of this 
protein sequence reveals the following: 
Possible site: 32 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5777 (Affirmative) < suco 
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— - Certainty=0 . 0000 (Not Clear) < suco 
•-- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

5 >GP:CAB89121 GB:AJ277485 beta-lactam resistance factor 

[Streptococcus pneumoniae] 
Identities = 215/410 (52%) , Positives = 283/410 (68%) 

Query: 1 MTLRELTIEEFKEHSGNYDSQSFLQTPEMJUCLLEICRGYDVRYLGyQVENKLEIISLSYIM 60 
10 ML LT EEF+ +S S+SF+Q+ +M LLEKRG + YL + E ++++ +L Y + 

Sbjct: 1 MMTTLTKEEFQTYSDQVSSRSFMQSVCMGDLLEKRGARIVYLALKQEGEIQVAALVYSL 60 

Query. 61 PVTGGFQMKIDSGPVHSNSKYLKQFYKALQGYAKSNGVLELIVEPYDDYQLFTSSGVPSN 120 
P+ GG M+++SGP+++ L FY L+ YAK NGVLEL+V+PY+ YQ F S G P + 
15 Sbjct: 61 PMLGGLHMELNSGPIYTQQDALPVFYAELKEYAKQNGVLELLVKPYETYQTFDSQGNPID 120 

Query: 121 QGNDNLIEDFTSSGYHHDGLTTGFTGK^LSIffiWKmiEGvTSETLLSSFSKTGRALVKKA 180 

++I+D T GY DGLTTG+ G W Y K+L +T ++LL SFSK G+ LVKKA 
Sbjct: 121 AEKKSIIQDLTDLGYQFDGLTTGYPGGEPDWLYYKDLTELTEKSLLKSFSKKGKPLVKKA. 180 

20 

Query: 181 MSFGIKVRVLKRDELHLFKEITTSTSNRRDWnDKSLDYYQDFYDSFEGKAEFVIATLNFR 240 

+FGI+++ LKR+EL +FK IT TS RR+Y DKSL+YY+ FYD+F +AEF+IA+LNF 
Sbjct: 181 ETFGIRLKKLKREELSIFKNITKETSERREYSDKSLEYYEHFYDTFGEQAEFLIASLNFS 240 

25 Query: 241 EYDHNDQIKAEALENKLKLLDERFRENADSPKYHRQRSEIINQLASFETRRQEVQSFIQK 300 

+Y LQ + LE L L +N S K Q E +Q +FE R+ E + I+K 

Sbjct: 241 DYMSKLQGEQSKLEENLDKLRLDLSKNPHSEKKQNQLREYSSQFETFEVRKAEARDLIEK 300 

Query: 301 YDNQDVVlAGSLFWSLKETvYFFSGSYTEFNKFYAPAVLQEYVMQEMljKRGSTFYNLLG 360 
30 Y +D+VLAGSLFVY +ET Y FSGSYTEFNKFYAPA+LQ+YVM E++KRG YN LG 

Sbjct: 301 YGEEDIVIAGSLFVYMPQETTYLFSGSYTEFNKFYAPALLQKXVMLESIKRGIPKYNFLG 360 

Query: 361 IQGTEDGSDSILRFKQNFNGCIIRKMGTFNYYPSPFKYKGIQLLKKvLKR 410 
IQG FDGSD +LRFKQNFNG I+RK GTF Y+PSP KYK IQLLKK++ R 
35 Sbjct: 361 IQGIFMSDGVIjRFKQNFNGYITOKftGTFRYHPSPLKYKAIQLLKKIVGR 410 

There is also homology to SEQ ID 5460. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 1957 

A DNA sequence (GBSx2066) was identified in S.agalactiae <SEQ ID 6065> which encodes the amino 
acid sequence <SEQ ID 6066>. This protein is predicted to be cell wall protein, 40 kDa (sr 5' region). 
Analysis of this protein sequence reveals the following: 

Possible site: 42 
45 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.45 Transmembrane 25 - 41 ( 23 - 42) 

Final Results 

bacterial membrane Certainty=0 .2381 (Affirmative) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9963> which encodes amino acid sequence <SEQ ID 9964> 
was also identified. 

55 The protein has homology with the following sequences in the GENPEPT database. 

!GB:AF278686 choline binding protein D; CbpD [Strept... 
!GB:AF278686 choline binding protein D; CbpD [Strept... 
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>GP:AAF877S8 GB:AF278S86 choline binding protein D; CbpD 
[Streptococcus pneumoniae] 
Identities = S3/230 (27%) , Positives = 108/230 (46%) , Gaps = 34/230 (14%) 

Query: 324 WTEQGGQDDI KWYTAVTTGDG -NYKWiVSFADHKNEKGLYNIHLYYQEASGTLVG 377 

W+ G + W + V GD NY S+ + +++++ G VG 

Sbjct: 123 WSTAGTYGHVAWVSNVM-GDQIEIEEYNYGYTESYNKRVIKM0TMTGF1HFKDLDGGSVG 181 

Query: 378 OTGTKVTVAGTNSSQEPIENGLAKTGVYNIIGSTEVKNEAKISSQTQFTLEKGDKINYDQ 437 

+ + + GT+ + + +K E S G+K++YDQ 

Sbjct: 182 NSQSSTSTGGTHYFKT KSAIKTEPLASGTVIDYYYPGEKVHYDQ 225 

Query: 438 VLTADGYQWISYKSYSGVRRYIPVKKLTTSSEKAKDEATKPTSYPNLPKTG-TYTFTKTV 49S 

+L DGY+W+SY +Y+G RY+ ++ + + P L TG T+ F 

Sbjct: 226 I LEKDGYKWLS YTAYNGS YRYVQLEAVNKN --PLGNSVLSSTGGTHYFKTKS 275 

Query: 497 DVKSQPKVSSPVEFNFQKGEKIHYDQVLWDGHQWISYKSYSGIRRYIEI 546 

+K++P VS+ V + GEK+HYDQ+L DG++W+SY +Y+G RRYI++ 
Sbjct: 276 AIKTEPLVSATVIDYYYPGEKVHYDQILEKDGYKWLSYTAYNGSRRYIQL 325 
Identities = 49/161 (30%) , Positives = 85/161 (52%) , Gaps = 14/161 (8%) 

Query: 116 GNYVYSKETEVKNTPSICSAPVAFYAKKGDKVFYDQVFNICDNVKWISYKSFCGVRRYAAIE 175 

G + + +++K P S V Y G+KV YDQ+ KD KW+SY ++ G RY +E 
Sbjct: 191 GTHYFKTKSAIKTEPLASGTVIDYYYPGEICVHYDQILEKDGYKWLSYTAYNGSYRYVQLE 250 

Query: 176 SLDPSGGSETKAPTPvTNSGS^QEKIATQGNYTFSHKVEVTOJEAKVASPTQFTLDKGDR 235 

+++ + P+ NS + +T G + F K +K E V++ G++ 

Sbjct: 251 AVNKN PLGNSVLS STGGTHYFKTKSAIKTEPLVSATVIDYYYPGEK 296 

Query: 236 IFYDQILTIEGNQWLSYKSFNGVRRFVLLGKASSVEKTEDK 276 

+ YDQIL +G +WLSY ++NG RR++ L +S + +++ 
Sbjct: 297 VHYDQILEKDGYKWLSYTAYNGSRRYIQLEGVTSSQNYQNQ 337 
Identities = 52/192 (27%), Positives = 90/192 (46%), Gaps = 13/192 (6%) 



+ EK Y+ h 





Query: 


295 




Sbjct: 


161 


40 




355 




Sbjct: 


216 






410 


45 


Sbjct: 


274 






470 


50 


Sbjct: 


333 



Identities = 33/113 (29%) , Positives = 56/113 (49%) , Gaps = 2/113 (1%) 

Query: 91 NTATKDITTPLVETKPMVEKTLPEQGNYVYSK-ET3VKNTPSKSAPVAFYAKKGDICVFYD 149 

N+ + + V P+ L G YK+++KPSAVY G+KV YD 
Sbjct: 241 NGSYRWQLEAWKNPLGNSVLSSTGGTHYFKTKSAIKTEPLVSATVIDYYYPGEKVHYD 300 

Query: 150 QVFNKDNVKWISYKSFCGVRRYAAIESLDPSGGSETKaPTPVTNSGSNNQEKI 202 

Q+ KD KW+SY ++ G RRY +E + S + ++ +++ GS++ + 
Sbjct: 301 QILEKDGYKWLSYTAYNGSRRYIQLEGVTSSQKYQNQSGN- ISSYGSHSSSTV 352 

A related GBS gene <SEQ ID 8937> and protein <SEQ ID 8938> were also identified. Analysis of 11 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -6.74 
GvH: Signal Score (-7.5) : 1.26 
Possible site: 42 
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»> Seems to have no N-terminal signal sequence 

ALOM program count: 1 value: -3.45 threshold: 0.0 

' INTEGRAL Likelihood = -3.45 Transmembrane 22 - 39 ( 23 - 42) 
PERIPHERAL Likelihood = 6.26 371 

modified ALOM score: 1.19 

*** Reasoning Step: 3 

Final Results 

bacterial membrane --- Certainty=0 .2381 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) 

The protein has homology with the following sequences in the databases: 

41.2/57.9% over 283aa 

Streptococ 

EGAD | 33594 | cell wall protein, 40 kDa (sr 5' region) Insert characterized 
PIR|A60328 |A60328 40K cell wall protein precursor (sr 5' region) - (strain OMZ 
175, serotype f) Insert characterized 

ORF02145(301 - 1803 of 2238) 

EGAD 1 33594 1 34911 (30 - 313 of 335) cell wall protein, 40 kDa (sr 5' region) [Stre 
ptococcus mutans}PIR|A60328|A60328 40K cell wall protein precursor (sr 5' region 
) - Streptococcus mutans (strain 0MZ175, serotype f) 
%Match =8.0 

%Identity =41.1 %Similarity =57.9 

Matches = 81 Mismatches = 79 Conservative Sub.s = 33 

156 186 216 246 276 306 336 366 

* Y J^***FCYTKOTIKSWVFFSRSIYSIKYYICITNISKIC*HVTKR^ 

JOTQKIWISSFYMLGAHSFSKAVYHNDRSVKLMKRIDIlIHQAQRFSIRIOrA 



FGLASVILGSFIMVTSPVFADQTTSVQVMJQTGTSVDA1MSSNETS 

|| |||::| : : : | |: I -1= =1111=11 II I 

FGAASVLIGCVFFLGTQNVSAQEQGTQL PASENAWNVAENSVAISQAVADKAATQTTLTETPQV 



60 



65 



- -TLPEQGNYVYSKETEVKNTPSKSAPVAF 
11111= =1111 I 1=1 



130 



140 200 210 220 230 240 



744 1533 1563 1593 1623 1653 1683 

YAKKGDKATFYDQVFNKD G'/YNIIGSTEVKNEAKISSQTQFTLEKGDKINYDQVLTADGYQWISYKSySGVRRYIPV 

III ==1111= II II 111=11111 1111=111 1= 

TQFNFDKGDKVFYDNVLEADGHQWI SYVSYSGIRRYAPI 

250 260 270 

1713 1743 1773 1803 1833 1863 1893 1923 

KKLTTSSEKAKDEATKPTSYPNLPKTGTYTFTKT^VKSQPCTSSPvEFNFQKGEKIHYDQVLvVDGHQWISYKSYSGIR 

= == I III III III =1 > I 

AVTIEELKQKEIVQQNLPAQGTYHFTKQQSLKMKLNCLVRPKSRFTTEITFFMIRF 

290 300 310 320 330 

A related DNA sequence was identified in S.pyogems <SEQ ID 6067> which encodes the amino acid 
sequence <SEQ ID 6068>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
»> Seems to have a cleavable N-term signal seg. 
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Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suc'o 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF87768 GB:AF278686 choline binding protein D; CbpD 
[Streptococcus pneumoniae] 
Identities = 93/217 (42%) , Positives = 136/217 (61%) , Gaps = 18/217 (8%) 



GD+YP+ +K G+ ID W MY RQCTSF AFRLS+ NGF++P YGNA WGH A+ +GY 



V+ TP+IG+I W 





42 


Sbjct: 


51 




101 


Sbjct: 


111 




161 


Sbjct: 


162 




221 


Sbjct: 


214 



I- ++G+IHFKDL + + SQ+S GT++F T+ +K + + 

JTMTGFIHFKDLDGGSVGN SQSSTSTGGTHYFKTKSAI KTEPLASGTVI D 213 



YY G+ V+YD+++ GY WLSY +++G+ RY+ + 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 34/94 (36%) , Positives = 52/94 (55%) ■ 

Query: 453 SGVRRYIPVKKLTTSSEKAKDEATKPTSYPNLPKTGTYTFTKTm)VKSQPKVSSPVEFNF 512 

S V YI K L++ + + K S + +GTY FT + VK Q + SP + 

Sbjct: 163 SQVSGYIHFKDLSSQTSHSYPRQLKHISQASFDPSGTYHFTTRLPVKGQTSIDSPDLAYY 222 

Query: 513 QKGEKIHYDQVLVVDGHQWISYKSYSGIRRYIEI 54 6 

+ G+ ++YD+V+ G+ W+SY S+SG RRYI I 
Sbjct: 223 EAGQSVYYDKWTAGGYTWLSYLSFSGNRRYIPI 256 
Identities = 30/78 (38%) , Positives = 45/78 (57%) , Gaps = 2/78 (2%) 

Query: 402 TGVYNIIGSTEVKNEAKISSQTQFTLEKGDKINYDQVLTADGYQWISYKSYSGVRRYIPV 461 

+GY+ VK+IS EG + YD+V+TA GY W+SY S+SG RRYIP+ 

Sbjct: 197 SGTYHFTTRLPVKGQTSIDSPDIAYYEAGQSVYYDKVVTAGGYTWLSYLSFSGNRRYIPI 256 

Query: 462 KKLTTSSEKAKDEATKPT 479 

K+ + +++ TKP+ 
Sbjct: 257 KE - - PAQS WQNDNTKPS 272 
Identities = 27/94 (28%) , Positives = 47/94 (49%) 

Query: 198 NQEKIATQGNYTFSHIWEVKNEAKVASPTQFTLDKGDRIFYDQILTIEGNQWLSYKSFNG 257 

+Q G Y F+ ++ VK + + SP + G ++YD+++T G WLSY SF+G 

Sbjct: 190 SQASFDPSGTYHFTTRLPVKGQTSIDSPDLAYYEAGQSVYYDKWTAGGYTWLSYLSFSG 249 



i VRRFVLLGKASSVEKTEDKEKVSPQPQARITKTG 291 



Query: 103 ETKP^^VEKTLPEQGNYWSKETEVKN^PSKSAPVAFYAKKGDKVFYDQVFMKDNVKWISY 162 

+ K + + + GY++ VK S+P Y + G V+YD+V W+SY 
Sbjct: 185 QLKHISQASFDPSGTYHFTTRLPVKGQTSIDSPDLAYYEAGQSVYYDKVVTAGGYTWLSY 244 

Query: 163 KSFCGVRRYAAIE 175 

SF G RRY 1 + 
Sbjct: 245 LSFSGNRRYIPIK 257 
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SEQ ID 8938 (GBS91) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 7; MW 63kDa). 

The GBS91-His fusion product was purified (Figure 195, lane 9) and used to immunise mice. The resulting 
antiserum was used for FACS (Figure 283), which confirmed that the protein is immunoaccessible on GBS 
bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1958 

A DNA sequence (GBSx2067) was identified in S.agalactiae <SEQ ID 6069> which encodes the amino 
acid sequence <SEQ ID 6070>. This protein is predicted to be thiamine biosynthesis protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0984 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB49673 GB:AJ248285 PROBABLE 2-DEHYDROPANTOATE 2 -REDUCTASE (EC 
1.1.1.169) [Pyrococcus abyssi] 
Identities = 85/301 (28%), Positives = 150/301 (49%), Gaps = 7/301 (2%) 

MLVYIAGSGAMGCRFGYQISKTNHDVILLDITOADHIMAIKENGLKVTGDTEDIjVKIjPIMK 6 0 
M +YI G+GA+G FG ++ DV+L+ H+ AI E GLK+ G + VK+ 

MKIYILGAGAIGSLFGGL1ANAGEDVLLIGR-DPHVSA1NEKGLKIVGIKDLNVKVEATT 59 

PTDATEEADLI ILFTKAMQLPNMLQDIKKI IGKKTKVLCLLNGLGHEDVIRQYIPEHNIL 120 
E+ DLI+L TK+ L+ + 1+ K + VL + NG4G+ED I ++ 4- 







Sbjct: 


1 




61 


Sbjct: 


60 




121 


Sbjct: 


116 




181 


Sbjct: 


176 




241 


Sbjct: 


23S 



++M E++ E 



A related DNA sequence was identified in S. pyogenes <SEQ ID 607 1> which encodes the amino acid 
sequence <SEQ ID 6072>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty^O. 1392 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 An alignment of the GAS and GBS proteins is shown below. 
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Identities = 262/307 (85%), Positives = 288/307 (93%) 







1 


MLWIAGSGAMGCRFGYQISKTHHDVILLDNVJADHIMAIKENGLKVTGDTEDLVKLPIMK 


SO 








MLWIAGSGAMGCRFGYQISKTN+DVILLDNW DHI AIKENGL VTGD E4 VKLPIMK 




5 


Sbjct: 


1 


MLVYIAGSGAMGCRFGYQISKTNNDVILLDNWEDHINAIKENGLWTGDVEETVKLPIMK 


60 




Query: 


61 


PTDATEEADLIILFTKAMQLPNMLQDIKKIIGKKTKVLCLLNGLGHEDVIRQYIPEHNIL 


120 








PT4AT+EADLI ILFTKAMQLP MLQDIK IIGK+TKVLCLLNGLGHEDVIRQYIPEHNIL 




10 


Sbjct: 


61 


PTEATQEADLIILFTKAMQLPQMLQDIKGIIGKETKVLCLLNGLGHEDVIRQYIPEHNIL 


120 


Query: 


121 


MGVTVWTAGLKGPGHAHLEGVGSVNLQSIDPNNQEAGHRVTELLNEAKLQATYDENVLPN 


180 








MGVTVWTAGL+GPG AHL+GVG+ +NLQS+D P +NQEAGH+V +LLNEA L ATYDENV+PN 






Sbjct: 


121 


MGVTVWTAGLEGPGRAHLQGVGAl^QSMDPSNQEAGHQVADLLNEANIJJATYDEHWPN 


180 


15 


Query 


181 


IWRKACVNGT^STCALLDCTIGQLFASEDGVNMVHEIIHEFVTVGKAEGVELDEEEITK 


240 








IWRKACVNGTMHSTCALLDCTIG+LFASEDG+ MV EIIHEFV VG+AEGVEL+EEEIT+ 






Sbjct: 


181 


IWRKACTNGT^STCALLDCTIGELFASEDGLKMVKEIIHEFVIVGQAEGVELNEEEITQ 


240 


20 


Query: 


241 


YVMDTSVKAAHHYPSMHQDLVQNQRLTEIDFLNGAVNKKGENLGIDTPYCRLITQLIHTK 


300 






YVMDTSVKAAHHYPSMHQDLVQN RLTEIDF+NGAVN KGE LGI+TPYCR+IT+L+H K 






Sbjct: 


241 


YVMDTSVKAAHHYPSMHQDLVQNHRLTEIDFINGAVNTKGEKLGINTPYCRMITELVHAK 


3 00 




Query: 


301 


ENVLSIK 307 










E VL+I + 




25 


Sbjct: 


301 


EAVLNIQ 307 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1959 

30 A DNA sequence (GBSx2068) was identified in S.agalactiae <SEQ ID 6073> which encodes the amino 
acid sequence <SEQ ID 6074>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -3.03 Transmembrane 61 - 77 ( 61 - 78) 
35 INTEGRAL Likelihood = -1.33 Transmembrane 80 - 96 ( 79 - 96) 

Final Results 

bacterial membrane Certainty=0 .2211 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
45 vaccines or diagnostics. 

Example 1960 

A DNA sequence (GBSx2069) was identified in S.agalactiae <SEQ ID 6075> which encodes the amino 
acid sequence <SEQ ID 6076>. This protein is predicted to be regulatory protein (pfoS/R). Analysis of this 
protein sequence reveals the following: 

50 Possible site: 49 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.82 Transmembrane 317 

INTEGRAL Likelihood = -7.64 Transmembrane 187 

INTEGRAL Likelihood = -5.26 Transmembrane 24 

55 INTEGRAL Likelihood = -5.04 Transmembrane 143 



- 333 ( 304 - 335) 

- 203 ( 183 - 217) 

- 40 ( 18 - 44) 

- 159 ( 139 - 161) 
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Likelihood = -2.34 Transmembrane 116 - 132 ( 115 - 136) 
INTEGRAL Likelihood = -2.13 Transmembrane 55 - 71 ( 55 - 71) 
INTEGRAL Likelihood = -0.96 Transmembrane 268 - 284 ( 268 - 284) 

Final Results 

bacterial membrane Certainty=0 .4927 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC65034 GB:AE001189 regulatory protein (pfoS/R) [Treponema 
pallidum] 

Identities = 138/358 (38%) , Positives = 220/358 (60%) , Gaps - 18/358 (5%) 



Query: 2 TNTVTPKETAGSFINKVLGGTATAIWALIPNAILATFLKPFLSYG-LAAEFLHIVQVFQ 60 

T +++P++ F+ K+L G++ IV+ L+P AI + LA H+V Q 

Sbjct: 3 TQSLSPRQ FMMKILNGSSAGIVIGLVPPAIAGELFRALAPLSPLFAALYHWLPIQ 58 

Query: 61 FFTPIMAGFLIGQQFKFTPMQQLAVGGAAYIGSGAWAYTEVIQKGVATGSFQLRGIGDLI 120 

F P + G L+G OF + + + + I SG + G++ + GIGD+I 

Sbjct: 59 FSVPALIGTLVGLQFHCSAPEVATLAFVSVIASG NVTLQNGAWLITGIGDVI 110 

Query: 121 NmLTAALAVLAVKWFGNKFGSLTIILLPIIIGTGVGYLGWKLLPYVSYVTTLIGQGINS 180 

N+ML +ALA++ V+ K GSLTII LP+I+ G +G LPYV +T +G+ I + 
Sbjct: 111 ITOILISALAIILVRALRGKLGSLTIIALPV'IVAWAGGVGSFSLPYVKMITLFVGRVIAT 170 

Query: 181 FTTLQPIAMS I LI AMAFSMLI VS P I STVAIGLAIGLNGMSASAASMGVASTTAVLVWATM 240 

F LQP+ MSIL++M+FS++I+SP+S+VA+G+A+GL G+++ AA++GV+S L+ TM 
Sbjct: 171 FIALQPLLMSILLSMSFSLIIISPVSSVAVGIAVGLTGLASGAANIGVSSCAMTLIVGTM 230 

Query: 241 KANKSGVPIAIALGAMI<MMMPNFLKHPVMAIPmM^TVSSLTVPLFKLVGTPASSGFGL 300 

+ NK GVP+A+ GAMKM+MPN++++P++ IP+L+ V + LF L GTPAS+GFG 
Sbjct: 231 RVNKIGVPIJ^FAGAMKMLMPNWIRYPIIiNIPLLIjNGLVCGvIiAWLFNLQGTPASAGFGF 290 

Query: 301 VGAVGPIASFE- -AGASML IVILSWLVIPFAVGFVSHKICKDILKLYKDDIFVFE 353 

+G VGPI ++ A M+ 1+ L + V+ F ++ ID LKLY+ ++F+ E 
Sbjct: 291 IGLVGPINAYRLMAYTPMVRAGILFLVYFVLSFLAAYLIDFILVDRLKLYRRELFIPE 348 



There is also homology to SEQ ID 1280. 

A related GBS gene <SEQ ID 8939> and protein <SEQ ID 8940> were also identified. Analysis c 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -7.24 
GvH: Signal Score (-7.5): -2.94 

Possible site: 49 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 7 value: -9.82 threshold: 0.0 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



INTEGRAL 



modified ALOM 



Likelihood = -9. 
Likelihood = -7 
Likelihood = -6 
Likelihood = -5 
Likelihood = -2 
Likelihood = -2 
Likelihood = -0 
Likelihood = 0 
score : 2.46 



, Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 333 ( 304 - 335; 

- 203 ( 183 - 2171 

- 159 ( 136 - 

- 40 ( 18 - 

- 132 ( 115 - 

- 71 ( 55 - 

- 284 ( 268 - 



* Reasoning Step: 3 

— Final Results 

bacterial membrane Certainty=0. 4927 (Affirmative) 

bacterial outside -— Certainty=0. 0000 (Not Clear) < 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < 
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The protein has homology with the following sequences in the databases: 

ORF02147(337 - 1359 of 1668) 

EGAD| 138195 |TP0038 (10 - 348 of 350) regulatory protein {Treponema pallidum} OMNI | TP0038. 
regulatory protein (pfoS/R) GP| 3322295 |gb|AAC65034 .1 | |AE001189 regulatory protein (pfoS/R) 
5 {Treponema pallidum} PIR|E71373 |E71373 probable regulatory protein (pfoS/R) - syphilis 

spirochete 
%Match =21.6 

% Identity =40.1 ^Similarity =65.6 

Matches = 135 Mismatches = 112 Conservative Sub.s = 86 

10 

87 117 147 177 207 237 267 297 

LQQDMGKHQSL*TKLSIIFILIEITV*SIQHH**^ 

327 357 387 417 444 474 504 534 

1 5 FMTNTvTPKETAGSFINI<VLGGTATAIWALIP^3AILATFIJKPFLSYG-LAAEFLHIVQVFQFFTPIMAGFLIGQQFKFT 
I: 1 = 1 h = 11= |s| II = = = : 11 = hi 111 = 1 hi II = 

MHTQSLSPRQFmKILMGSSAGIVIGLVPPAIAGEjFRALAPLSPLFAALYHWLPlQFSVPALIGTLVGLQFHCS 
10 20 30 40 50 60 70 

20 564 594 624 654 684 714 744 774 

PMQQLAVGGAAYIGSGAWAYTEVIQKGVATGSFQLRGIGDLI^LTAAIAVLAVKWFGNKFGSLXIILLPIIIGTGVGY 
= = = I II = h= = lllhlhll HI!- h = hill II Ihh I 

APEVATIAFVSVIASG NVTLQNGAWLITGIGDVINVKLISAIAIILVRALRGKLGSLTIIALPVIVAWAGG 

90 100 110 120 130 140 

25 

804 834 864 894 924 954 984 1014 

LGWKLLPWSWrTLIGQGINSFTTLQPIAMSILIAMAFSMLIVSPISTVAIGIAIGLNGMSASAASMGVASTTAVLVWA 
= 1 Mil ■ I =:h I =1 Nh lllh:hlh = hlhhlhhhll |::» Ih'lhl 1 = 
VGSFSLPYVKMITLFVGRVIATFIALQPLLMSILLSMSFSLIIISPVSSVAVGIAV3LTGIASGAANIGVSSCAMTLIVG 
30 160 170 180 190 200 210 220 

1044 1074 1104 1134 1164 1194 1224 1248 

TMKANKSGVPIAIALGAMKM^PNFLKHPVmiPMLMTAWSSLTVPLFKLVGTPRSSGFGLVGAVGPIASFE--AGASM 
Ih II 111 = 1= llllhllh = = = h= Ihh I = III lllll = lll = = l I I I I == I I 
35 TMRVNKIGVPIAMFAGAMKMLMPNWIRYPILNIPLLLNGLVCGV^ 

240 250 260 270 280 290 300 

1269 1299 1329 1359 1389 1419 1449 1479 

L 1 VI LSWLVI PFAVGFVSHKI CKDI LKLYKDD I FVFEGQN* FGGCMLVYI AGSGAMGCRFGYQI SKTNHDVI LLDNW 

40 : |: | ::h I == I I 1111= = = l= I 

VRAGILFLVYFVLSFLARYLIDFILVDRLKLYRRELFIPEQG 
320 330 340 350 



There is also homology to SEQ ID 1276 

45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1961 

A DNA sequence (GBSx2070) was identified in S.agalactiae <SEQ ID 6077> which encodes the amino 
acid sequence <SEQ ID 6078>. Analysis of this protein sequence reveals the following: 

50 Possible site: 20 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < succs> 

55 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07127 GB:AP001518 thioredoxin reductase [Bacillus halodurans] 
60 Identities = 163/325 (50%), Positives = 222/325 (68%), Gaps = 3/325 (0%) 
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Query: 5 IYDITIVGGGPVGLFAAFYAGLR3VSVKIIESLSELGGQPAILYPEKKIYDIPGYPVITG 64 

+YDITI+GGGP GLFAAFY G+R VKIIES+ +LGGQ A LYPEK IYD+ G+P + 
Sbjct: 7 LYDITIIGGGPTGLFAAFYGGMRQAKOTIIESMPQJ^QIAALYPEKYIYDVAGFPKVKA 66 

Query: 65 RELIDKHIEQLERFKDSIEICLKEEVLSFEK-VDDVFTIQTDKDQHLSRAIVFACGNGAF 123 

++L++ Q E+F +1 L++ V + K DD FTI+TDK+ H S+AI+ G GAF 
Sbjct: 67 QDL VNDLKRQAEQFNPTI - -ALEQSVQNVTKETDDTFTI KTDKETHYSKAI I ITAGAGAF 124 

Query: 124 APRLLGLEI^ElTOADKl^FYl^KLEQFAGKHWICGGGDSAVDWANELDKIAASVAIVH 183 

PR L 4E 4 Y NL Y V L +AGK+V+I GGGDSAVDWA L+ +A +V ++H 
Sbjct: 125 QPRRLEVEGAKQYEGICNLQYFVOTJmAYAGKNVLISGGGDSAVDWALMLEPVAKNVTLIH 184 

Query: 184 RRDAFRAHEHSVDILKASGVRILTPYVPIGLNGDSQRVSSLWQKVKGDEVIELPLDNLI 243 

RRD FRAHEHSV++L+ S V ILTP+ L+GD +++ + +Q+VKGD V L +D +1 
Sbjct: 185 RRDKFRAHEHSVELLQKSSVNILTPFAISELSGDGEKIHHVTIQEVKGDAVETLDVDEVI 244 

Query: 244 VSFGFSTSNKNLRYWNLDYKRSSINVSSLFETTQEGVYAIGDAANYPGKVELIATGYGEA 303 

V+FGF +S ++ W L+ +++SI V++ ET G+YA GD YPGKV+LIATG+GEA 
Sbjct: 245 AmFGFVSSLGPIKGWGLEIEKNSIVVNTKMETNIPGIYAAGDICTYPGKVIOjIATGFGEA 304 

Query: 304 PVAINQAINYIYPDRDNRWHSTSL 328 

P A+N A +1 P HSTSL 
Sbjct: 305 PTAVNNAKAFIDPTARVFPGHSTSL 329 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6079> which encodes the amino acid 
sequence <SEQ ID 6080>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
»> Seems to have an uncleavable N-term signal seq 
INTEGRAL Likelihood = -0.37 Transme 



■ Final Results 

bacterial membrane — Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



Query: 4 KAYDITIIGGGPIGLFAAFYAGLRGVTVKIIESLSELGGQPAILYPEKMIYDIPAYPSLT 63 

K YDITI IGGGP+GLF AFY G+R +VKIIESL +LGGQ + LYPEK IYD+ +P + 

Sbjct: 6 KVYDITI1GGGPVGLFTAFYGGMRQASVKIIESLPQLGGQLSALYPEKYIYDVAGFPKIR 65 

Query: 64 GVELTENLIKQLSRFEDRTTICLKEEVLTFDKVKGG-FSIRTNKAEHFSKAIIIACGNGA 122 

EL NL +Q+++F+ TICL++ V + +K GF+ K K I GNGA 

Sbjct: 66 AQELINNLKEQMAKFDQ--TICLEQAVESVEKQAEGVFKLVQMKKPTTLKRSCITAGNGA 123 

Query: 123 FAPRTLGLESEENFADHNLFYNVHQLDQFAGQKWICGGGD3AVDWALALEDIAESVTW 182 

F PR L LE4 E + NL Y V L +FAG++V I GGGDSAVDWAL LE IA+ V+++ 

Sbjct: 124 FKPRECLELENAEQYEGKNLHYFVDDLQKFAGRRVAILGGGDSAVDWALMLEPIAKEVSII 183 



Query: 243 IVSFGFSTSNKNLKNWl^DYKRSSITVSPLFQTSQEGIFAIGDAAAYNGKVDLIATGFGE 302 

IV++GF 4S +KNW LD 44+SI V +T4 EG FA GD Y GKV4LIA4GFGE 
Sbjct: 243 IVNYGFVSSLGPIKNWGLDIEKNSIWKSTMETNIEGFFAAGDICTYEGKVNLIASGFGE 302 

Query: 303 APTAVNQAINYIYPDRDNRWHSTSLID 330 

APTAVN A Y+ P + +HSTSL 4 
Sbjct: 303 APTAVNNAKAYMDPKARVQPLHSTSLFE 330 



65 An alignment of the GAS and GBS proteins is shown below. 
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Identities = 242/324 (74%) , Positives = 279/324 (85%) 



Query: 




YDITIVGGGPVGLFAAFYAGLRGVSVKIIESLSEL3GQPAILYPEKKIYDIPGYPVITGR 


65 






YDITI+GGGP+GLFAAFYAGLRGV+VKIIESLSELGGQPAILYPEK I YD IP YP +TG 




Sbj ct: 


6 


YDITIIGGGPIGLFAAFYAGLRGVTVKIIESLSELGGQPAILYPEKMIYDIPAYPSLTGV 


65 


Query: 


66 


ELIDKHIEQLERFKDSIEICLKEEVLSFEKVDDVFTIQTDKDQHLSRAIVFACGNGAFAP 


125 






EL + I+QL RF+D ICLKEEVL+F+KV F+I+T+K *H S+AI + ACGNGAFAP 




Sbj Ct : 


66 


ELTENL I KQLSRFEDRTTI CLKEEVLT FDKVKGGFSI RTNKAEHFSKAI I IACGNGAFAP 


125 


Query: 


126 RLLGLENEENYADNNLFYNVTKLEQFAGKHWI CG3GDSAVDWANELDKIAASVAI VHRR 


185 






R LGLE+EEN+AD+NLFYNV +L+QFAG+ WICGGGDSAVDWA L+ IA SV +VHRR 




Sbj ct : 


126 


RTLGLESEENFADHNLFYNVHQLDQFAGQKV^/ICGGGDSAVDWAIALEDIAESVTVVHRR 


185 


Query - 


106 


DAFRAHEHSVDILKASGVRILTPYVPIGLNGDSQRVSSLWQKVKGDEVIELPLDNLIVS 


245 






DAFRAHEHSV++LKAS V +LTPYVP L G LV+QKVK DEV+EL LD+LIVS 




Sbj ct : 


186 


DAFPJfflEHSVELLKASTVISrLLTPWPKALKGIGNIAEKLVIQKVKEDEVLELELiDSIjIVS 


245 




246 


FGFSTSNKNLRYWNLDYKRSSINVSSLFETTQEGVYAIGDAANYPGKVELIATGYGEAPV 


305 






FGFSTSNKNL+ WNLDYKRSSI VS LF+T+QEG++AIGDAA Y GKV+LIATG+GEAP 




Sbjct: 


246 


FGFSTSNKNLKNWNLDYKRSSITVSPLFQTSQEGIFAIGDAAAYNGKVDLIATGFGEAPT 


305 




306 


AINQAINYIYPDRDNRWHSTSLI 329 








A+NQAINYIYPDRDNRWHSTSLI 




Sbj Ct : 


306 


AVNQAINYIYPDRDNRWHSTSLI 329 





SEQ ID 6078 (GBS178) was expressed in E.coli as a His-rusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 38 (lane 5; MW 37.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 8; MW 62.4kDa). 

30 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1962 

A DNA sequence (GBSx2071) was identified in S.agalactiae <SEQ ID 6081> which encodes the amino 
acid sequence <SEQ ID 6082>. This protein is predicted to be tRNA methyltransferase (trmD). Analysis of 
35 this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 . 1496 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06198 GB:AP001515 tRNA methyltransferase [Bacillus halodurans] 
Identities = 144/246 (58%) , Positives = 186/246 (75%) , Gaps = 6/246 (2%) 



2 MKIDILTLFPEMFAPLEHS-IVGKAKERGLLEINYHNFRENAE-KSRHVDDEPYGGGQGM 59 

MKID LTLFPEMF + HS 1+ +A+ERG + NFRE +E K + VDD PYGGG GM 

1 MKIDFLTLFPEMFC^VLHSSILKQAQERGAVSFR^VNFREYSENKHKKVDDYPYGGGAGM 60 

60 LLRAQP I FDTIDKIDAQKA RVIIiLDPAGRTFDQDFAEEIjSKEDEIjI FI CGHYEGYDE 116 

+L QP+FD ++ + + + RVTIrt- P G TF Q AEEL++ + LI +CGHYEGYDE 

61 VLSPQPLFDAVEDLTKKSSSTPRVILMCPQGETFTQRKAEELAQAEHLILLCGHYEGYDE 120 

117 RIKS-LV^DEVSLGDFVLTGGELAAMT^m3ATVRLIPEVIGKETSHQDDSFSSGLLEYPQ 175 

RI+S LVTDE + S +GD+ VLTGGEL AM + D+ RL+P V+G ETS Q DSFS+GLLEYPQ 
121 RIRSYLVTDELSIGDYVLTGGELGAMVIADSVTRLLPAVLGNETSAQTDSFSTGLLEYPQ 180 
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Query: 236 KTEIER 241 
Sbjct: 241 RKQQEK 246 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6083> which encodes the amino acid 
sequence <SEQ ID 6084>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2705 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 [Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 195/240 (81%) , Positives = 224/240 (93%) 

MKIDILTLFPEMFAPLEHSIVGKAKERGLLEIItfYHNFRENAEKSRHvDDEPYGGGQGMLL 6 1 
MKIDILTLFPEMFAPLEHSIVGKAKE+GLL+I+YHNFR+ AEK+RHVDDEPYGGGQGMLL 
MKIDILTLFPEMFAPLEHSIVGKAKEKGLLDIHYHNFRDYAEKARHVDDEPYGGGQGMLL 6 0 

RAQPIFDTIDKIDAQKARVILLDPAGRTFDQDFAEELSKEDELIFICGHYEGYDERIKSL 121 
RAQPIFDTI++I+A+K R+ILLDPAG+ F Q +AEEL+ E+ELIFICGHYEGYDERIK+L 
RAQPIFDTIEQIEAKKPRIILLDPAGKPFTQAYAEELALEEEL-FICGHYEGYDERIKTL 120 



VTDE+SLGDFVLTGGEIARMTMVDATVRLIP+V+GKK+SHQDDSFSSGLLEYPQYTRPYD 





2 


Sbjct: 






62 


Sbjct: 


61 


Query: 


122 


Sbj Ct: 


121 




182 


Sb j Ct : 


181 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1963 

A DNA sequence (GBSx2072) was identified in S.agalactiae <SEQ ID 6085> which encodes the amino 
acid sequence <SEQ ID 6086>. This protein is predicted to be 16S rRNA processing protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.71 Transmembrane 32 - 48 ( 32 - 52) 

Final Results 

bacterial membrane Certainty=0 .2084 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9381> which encodes amino acid sequence <SEQ ID 9382> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13475 GB:Z99112 similar to hypothetical proteins [Bacillus subtilis] 
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Identities = 88/174 (50%) , Positives = 128/174 (72%) , Gaps = 1/174 (0%) 

Query: 54 VTMEYEWGKIVOTQGLQGEMRVIjSVTDFVEERFKKGQvXALFDEKNQFVMDIEIASHRK 113 

+T +FNVGKIVNT G++GE+RV+S TDF' EER+K G h LF + +++ + +HR 

Sbjct: 1 MTKRWFNVGKIVNTHGIKBEWVISKTDFJffiERYKPGNTLyLFMDGRN3PVEOTVNTHRL.' SO 

Query: 114 QKNFDIIKFKGMYHINDIEKYKGFTLKVflEDQLSDLKDGEFYYHEIIGLDVYEGE-ELIG 172 

K F +++FK +4N++E+ K -t-KV E++L +L +GEFY+HEIIG +V+ E ELIG 
Sbjct: 61 HKQFHLLQFKERQNLNEVEELKNAIIKVPEEELGELNE3EFYFHEIIGCEVFTEEGELIG 120 

Query: 173 KI KE I LQPGAND VWWERHGKRDLLLPYI PPWLE VDLSNQRVQVELMEGLDDE 226 

K+KEIL PGANDVWV+ R GIC+D L+PYI W +D+ +++++ELMEGL DE 
Sbjct: 121 ron^ILTPGANDVWlGRKGKXDALIPYIESWKHIDVREKKIEIELMEGLIDE 174 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6087> which encodes the amino acid 
sequence <SEQ ID 6088>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2787 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 133/172 (77%), Positives = 153/172 (88%) 

Query: 56 MEYFNVGKIVNTQGLQGEMRVLSVTDFVEERFKICGQVIALFDEKNQFVMDIEIASHRKQK 115 

MEYFNVGKIVNTQGLQGEMRVLSV+DF EERFKKG LRLFD+K++FV ++ I SHRKQK 
Sbjct: 1 MEYFNVGKIVNTQGLQGEMRVLSVSDFAEERFKKGSQLALFDDKDRFVQEVTIVSHRKQK 60 

Query: 116 NFDIIKFKGMYIIINDIEICYKGFTIiKVAEDQLSDLKDGEFYYIIEIIGLDVYEGEELIGKIK 175 

+FD1IKFK MYHIN IEKYKG+TLKV++D DL++GEFYYH+ 1 IG+ VYE + LIG +K 
Sbjct: 61 HFDIIKFKE^HINAIEKYKGYTLKVSKDNCGDLQEGEFYYHQIIGMAVYEKDVLIGHVK 120 

Query: 176 EILQPGANDVWWERHGKRDLLLPYIPPWLEVDLSNQRVQVELMEGLDDED 227 

EILQPGANDVW+V+R GKRDLLLPYIPPWL VD+ N+RV VELMEGLDDED 
Sbjct: 121 EILQPGANDVWIVKRQGKRDLLLPYIPPVVLNVDVPNKRVDVELMEGLDDED 172 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1964 

A DNA sequence (GBSx2073) was identified in S.agalactiae <SEQ ID 6089> which encodes the amino 
acid sequence <SEQ ID 6090>. This protein is predicted to be similar to E. coli ykfC (11). Analysis of this 
protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3488 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0 0 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0 0 0 0 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9909> which encodes amino acid sequence <SEQ ID 9910> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAC38715 GB:AF0303S7 maturase-related protein [Streptococcus pneumoniae] 
Identities = 366/425 (86%) , Positives = 396/425 (93%) 



Query: 


12 


Sbjct: 


1 


Query: 


72 


Sbjct: 


61 


Query: 


132 


Sbjct: 


121 


Query: 


192 


Sbjct: 


181 


Query: 


252 


Sbjct: 


241 




312 


Sbjct: 


301 




372 


Sbjct: 


361 




432 


Sbjct: 





I++LLEYLNDGYEWI VDIDLEKFFDTVPQDRLMSLVHNII+DGDTESLIRKYLHSGV+IN 



GQR+KTLVGTPQGGNLSPLLSNIMLNELDK LEKRGLRFVRYADDCVITVGSEAAAKRVM 



+SVS +IEKRLGLKVNMTKTKI RP +LKYLGFGFWKS GWK RPHQDSV+ FK KLK+ 



LT RKWSIDL RIE+LN IRGWINYFSLGNMKSI + IDERLRTR+R+ 1 IWKQWKKK+ 



+RLWGLLKLGV +WIADKVSGWGDHYQLVAQKSVLKRAISK? L KRGLVSCLDYYLERH 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1965 

A DNA sequence (GBSx2074) was identified in S.agalactiae <SEQ ID 6091> which encodes the amino 
acid sequence <SEQ ID 6092>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.37 Transmembrane 7 - 23 ( 7 - 23) 

Final Results 

bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 82 1> which encodes the amino acid 
sequence <SEQ ID 822>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.87 Transmembrane 1157 -1173 (1157 -1174) 

Final Results 

bacterial membrane Certainty=0. 2147 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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3 GTWAVIDAGFDKNHKAWRLTDKTKARYQSKE+LEKAKKEHGITYGEWVNDKVAYYHD 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 1031/1064 (96%) , Positives = 1042/1064 (97%) 

Query: 1 MRKKQKLPFDKIAIALISTSILLNAQSDIKAMTVTEDTPATEQAVEPPQPIAVSEESPSS 60 

+RKKQKLPFDKLAIAL+STSIXjimQSDIKMITVTEDTPATEQAVE PQP AVSEE+PSS 
Sbjct: 1 LRKKQKLPFDKLAIALMSTS ILLNAQSD I KANTVTEDTPATEQAVETPQPTAVSEEAPSS 60 

Query: 61 KETKTSQTPSDVGETVADDANDLAPQAPAI<;TADTPATSI<ATIRDLNDPSHVKTLQEKAGK 120 

KETKT QTP D ET+ADDANDLAPQAPAKTADTPATSKATIRDLNDPS VKTLQEKAGK 
Sbjct: 61 KETKTPQTPDDAEETIADDAHDLAPQAPAKTADTPATSKATIRDLNDPSQVKTLQEKAGK 120 

Query: 121 

Sbjct: 121 

Query: 181 



Query: 241 NYAQAIRDAWLGAKVINMSFGNAAIAYANLPDETKKAFDYAKSKGVSIVTSAGNDSSFG 300 

NYAQAI DAVNLGAKVINMSFGNAALAYANLPDETKKAFDYAKSKGVS IVTSAGNDSSFG 
Sbjct: 241 OTAQAIIDAVNLGAKVINMSFGNAALAYANLPDETKKAFDYAKSKGVSIVTSAGNDSSFG 300 

Query: 301 GKPRLPLADHPDYGWGTPAAADSTLTVASYSPDKQLTETATVKTDDHQDKEMPVLSTNR 360 

GK RLPLADHPDYGWGTPAAADSTLTVASYSPDKQLTETATVKT D QDKEMPVLSTNR 
Sbjct: 301 GKTRLPIiADHPDYGWGTPAAADSTLTVASYSPDKQLTETATVKTADQQDKEMPVLSTNR 360 

Query: 361 FEPNKAYDYAYANRGTKEDDFKDVEGKIALIERGDIDFKDKIANAKKAGAVGVLIYDNQD 420 

FEPNKAYDYAYANRG KEDDFKDV+GKIALIERGDIDFKDKIANAKKAGAVGVLIYDNQD 
Sbjct: 361 FEPNKAYDYAYANRGMKEDDFKDVKGKIALIERGDIDFKDKIANAKKAGAVGVLIYDNQD 420 

Query: 421 KGFPIELPNVDQMPAAFISRRDGLLLKDNPQKTITFNATPKVLPTASGTKLSRFSSWGLT 480 

KGFPIELPNVDQMPAAFISR+DGLLLK+NPQKTITFNATPKVLPTASGTKLSRFSSWGLT 
Sbjct: 421 KGFPIELPNVDQMPAAFISRKDGLLLKENPQKTITFKATPKVLPTASGTKLSRFSSWGLT 480 

Query: 481 ADGNIKPDIAAPGQDILSSVANNKYAKLSGTSMSAPLVAGIMGLLQKQYETQYPDMTPSE 540 

ADGNIKPDIAAPGQDILSSVANNKYAKLSGTSMSAPLVAGIMGLLQKQYETQYPDMTPSE 
Sbjct: 481 ADGNIKPDIAAPGQDILSSVANNKYAKLSGTSMSAPLVAGIMGLLQKQYETQYPDMTPSE 540 

Query: 541 RLDLAKKVLMSSATALYDEDEKAYFSPRQQGAGAVDAKlQ^SAATMYVTDKDlSrrSSKVHIjN 600 

RLDIAKKVLMSSATALYDEDEKAYFSPRQQGAGAVDAKKASAATMYVTDKDNTSSKVHIiN 
Sbjct: 541 RLDLAKKVLMSSATALYDEDEKAYFSPRQQGAGAVDAKI^ASAATWYVTDICDOTSSKVHLN 600 

Query: 601 NVSDKFEVTVTVHNKSDKPQELYYQ\T7VQTDKA/TGKHFALAPKALYETSWQKITIPANSS 660 

NVSDKFEVTVTVHNKSDKPQELYYQ TVQTDKVDGK FALAPKALYETSWQKITIPANSS 
Sbjct: 601 NVSDKFEVTVTVHNKSDKPQELYYQATVQTDKA/T1GKLFALAPKALYETSWQKITIPANSS 660 

Query: 661 KQVTVPIDASRFSKDLLAQMKNGYFLEGFVRFKQDPTKEELMSIPYIGFRGDFGNLSALE 720 

KQVT+PID S+FSKDLLA MKNGYFLEGFVRFKQDPTKEELMSIPYIGFRGDFGNLSALE 
Sbjct: 661 KQVTIPIDVSQFSKDLLAPMKNGYFLEGFWFKQDPTKEELMSIPYIGFRGDFGNLSALE 720 

Query: 721 KPIYDSKDGSSYYHEANSDAKDQLDGDGLQ7YALI<NNFTALTTESNPWTIII<AVKEGVEN 780 

KPIYDSKDGSSYYHFANSDAKDQLDGDGLQFYALKNNFTALTTESNPWTIIKAVKEGVEN 
Sbjct: 721 KPIYDSKDGSSYYHFJWSDAIOQLDGDGLQFYALKiOTFTALTTESNPWTIIKAVKEGVEN 780 

Query: 781 IEDIESSEITETIFAGTFAKQDDDSHYYIHRHANGKPYAAISPNGDGNRDYVQFQGTFLR 840 

IEDIESSEITETIFAGTFAKQDDDSHYYIHRHANGKPYAAISPNGDGNRDYVQFQGTFLR 
Sbjct: 781 IEDIESSEITETIFAGTFAKQDDDSHYYIHRHANGKPYAAISPNGDGNRDYVQFQGTFLR 840 

Query: 841 NAKNLVAEVLDKEGNWWTSEVTEQWKNYl^ 900 

NAKNLVAEVLDKEGNVVWTSEVTEQVWNYl^LASTLGSTRFEKTRWDGK+KDGKVVAN 
Sbjct: 841 NAK^VAEVLDKEGNWWTSEVTEQWKtmnro^ 900 

Query: 901 GTYTYRVRYTPISSGAKEQHTDFDVIVDNTTPEVATSATFSTEDSRLTLASKPKTSQPVY 960 

GTYTYRWYTPISSGAKEQHTDFDVIVDNTTPEVATSATFSTED RLTLASKPKTSQPVY 
Sbjct: 901 GTYTYRVRYTPISSGAKEQHTDFDVIVDNTTPEVATSATFSTEDRRLTLASKPICrSQPVY 960 
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Query: 961 RERIAYTYMDEDLPTTEYISPNEDGTFTLPEEAETMEGAOTPLKMSDFTYWEDMAGNIT 1020 

RERIAYTYMDEDLPTTEYISPNEDGTFTLPEBAETMEGAOTPIiKMSDFTYWEDMAGNIT 
Sbjct: 961 RERIAYTYMDEDLPTTEYISPNEDGTFTLPEERETMEGATVPLKMSDFTYVVEDMAGNIT 1020 

Query: 1021 YTPVTKLLEGHSNKPEQDGSDQAPDKKPEAKPEQDGSGQTPDKK 1064 

YTPVTKLLEGHSNKPEQDGSDCAPDKKPE KPEQDGSGQ PDKK 
Sbjct: 1021 YTPVTKLLEGHSNKPEQDGSQQAPDKKPETXPEQDGSGOAPDKK 1064 

A related GBS gene <SEQ ID 8941> and protein <SEQ ID 8942> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 10 
McG: Discrim Score: S.69 
GvH: Signal Score (-7.5): -3.33 

Possible site: 25 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -0.37 threshold: 0.0 

INTEGRAL Likelihood = -0.37 Transmembrane 7 - 23 ( 7 - 23) 
PERIPHERAL Likelihood = 2.81 508 
modified ALOM score: 0.57 

*** Reasoning Step: 3 



- Certainty=0. 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ LD 8942 (GBS276) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 46 (lane 2; MW 123kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 60 (lane 5; MW 46.5kDa). 

The GBS276-His fusion product was purified (Figure 206, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 296), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1966 

A DNA sequence (GBSx2075) was identified in S.agalactiae <SEQ ID 6093> which encodes the amino 
acid sequence <SEQ ID 6094>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4285 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=o . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1967 

A DNA sequence (GBSx2076) was identified in S.agalactiae <SEQ ID 6095> which encodes the amino 
acid sequence <SEQ ID 6096>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
5 »> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.15 Transmembrane 19 - 35 ( 11 - 39) 

Final Results 

bacterial membrane --- Certainty=0. 5458 (Affirmative) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 991 1> which encodes amino acid sequence <SEQ ID 9912> 
was also identified. 

15 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 6096 (GBS654) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 142 (lane 8 & 10; MW 51.2kDa + lane 9; MW 27kDa). Purified GBS654-GST is 
shown in Figure 245, lane 1 1. 
20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1968 

A DNA sequence (GBSx2077) was identified in S.agalactiae <SEQ ID 6097> which encodes the amino 

acid sequence <SEQ ID 6098>. Analysis of this protein sequence reveals the following: 

25 Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Pinal Results 

bacterial cytoplasm Certainty=0. 4174 (Affirmative) < suco 

30 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9913> which encodes amino acid sequence <SEQ ID 9914> 
was also identified. 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF27324 GB:AF178424 unknown [LactOCOCCUS lactis] 
Identities = 26/75 (34%) , Positives = 45/75 (59%) , Gaps = 4/75 (5%) 

Query: 11 MAFEPKNSELTKVLKES-LDEEKKEIFSSEMNIRDFERTKQYQFTLQPSVRKKIDRLSKE 69 
40 MAF+ +++VLSL + KE+ I EKY FTL+PSV++ +++L+++ 

Sbjct: 1 MAFDVDDKKVKTVLSNSSLAKSKVEL PKKI3SEENKKSYSFTLEPSVKEGLEKLAEK 57 

Query: 70 KGYRSASSFINDFFK 84 
+ Y++ S F+ND K 
45 Sbjct: 58 QNYKNTSQFLNDLIK 72 

No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics: 

Example 1969 

A DNA sequence (GBSx2078) was identified in S.agalactiae <SEQ ID 6099> which encodes the amino 
5 acid sequence <SEQ ID 610O. This protein is predicted to be Par A. Analysis of this protein sequence 
reveals the following: 

Possible site: 45 

»> Seems to have an uncleavable N-term signal seq 

10 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF27325 GB:AF178424 ParA [Lactococcus lactis] 
Identities = 49/104 (47%) , Positives = 72/104 (69%) 

Query: 22 LSERLEEFKTFAFDFKTRASYVTAKLFFLC3NMIKHNTNSSKELIRSLKNDKSVLAMIPHK 81 
20 h ERL+ FK E D +TR +Y+TA +F+GN I+HNT SS+E + DK +AMIP K 

Sbjct: 157 LIERLQNFKDEVIDARTRETYITAIPyFVGNRIRHNTKSSREFSEKISQDKGTIAMIPEK 216 

ELFNRSTLD L M DK++++ + F+++++F F +IT+K+ 
25 Sbjct: 217 ELFNRSTLDGVPLVEMEKDKDVFNSNKVFYEKLNFAFNEITNKI 260 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

30 Example 1970 

A DNA sequence (GBSx2079) was identified in S.agalactiae <SEQ ID 6101> which encodes the amino 

acid sequence <SEQ ID 6102>. This protein is predicted to be transposase (orfA). Analysis of this protein 

sequence reveals the following: 

Possible site: 42 
35 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2830 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

40 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1971 

45 A DNA sequence (GBSx2080) was identified in S.agalactiae <SEQ ID 6103> which encodes the amino 
acid sequence <SEQ ID 6104>. This protein is predicted to be transposase (orfB). Analysis of this protein 
sequence reveals the following: 

Possible site: 16 
' >» Seems to have no N-terminal signal sequence 
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1 


Sbjct: 


1 




61 


Sbj ct : 


61 




121 


Sbjct: 


121 




181 


Sbj Ct : 


181 




241 


Sbjct: 


241 



Final Results 

bacterial cytoplasm --- Certainty=0 .2618 (Affirmative) < succ 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



MCRWLNMPHSSYYYQAVESVSETEFEETIKRIFLDSESRyGSRKIKICLNNEGITLSRI?R 60 
MCRWLN+P SSYYY+AVE VSE E EE+IK IFL+S++RYGSRKIKI CLNNEGITLSRRR 
MCRWLN1PRSSYYYKAVEPVSEAELEESIKAIFLESICARYGSRKIKICLNNEGITLSRRR 60 



IRRIMKRLNLVSVYQKATFKPHSRGKNEAPI PNHLDRQFK ERPLQALVTDLTYVRVGNR 



WAYVCLI IDLYNREI IGLSLGWHKTAELVKQAIQS I PY LTKVKMFHSDR KEF+NQLID 



EILEAFGITRSIiSQAGCPYDNAVAESTYRAFKIEFVYQETFQ LEELALKTK 



HRIHGSLNYQTPMTKRLIA 
HRIHGSUtfYQTPMTKRLIA 259 

There is also homology to SEQ ID 32. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1972 

A DNA sequence (GBSx2081) was identified in S.agalactiae <SEQ ID 6105> which encodes the amino 
acid sequence <SEQ ID 6106>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .3325 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1973 

A DNA sequence (GBSx2082) was identified in S.agalactiae <SEQ ID 6107> which encodes the amino 
acid sequence <SEQ ID 6108>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .4442 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9917> which encodes amino acid sequence <SEQ ID 9918> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD44095 GB:AF115103 orf359 gp [Streptococcus thermophilus 
bacteriophage Sfi21] 
Identities = 92/357 (25%) , Positives = 162/357 (44%) , Gaps = 33/357 (9%) 

Query: 45 RIa^QYGKTFETMKFAYDELWIKYEFANlC^/SLENy^MTFE^rolNKIYLRAYKQK-VQSvT 103 

RK + F T EA ++ + + V+++ ++T +Y K + YK+ V +T 

Sbjct: 24 RKPKTKGGFRTKSEAIKAAAEMELKLQDNVNVDE-DITLYDYF-KQWCEVYKKPTVSKIT 81 

Query: 104 YKTALPHHKLFIQYFGLKPLKAITPRDCEAFRLHIIENYSENYAKNLWSRF KACMG 159 

YK + + +FG K LK+IT + + ++ +Y++ +A++ RF KAC+ 

Sbjct: 82 YKAYINSQRKIELFFGDKKLKSITATEYQ RVLNSYAKTHAQDTVERFNVHVKACIE 137 

Query: 160 YAERLGYISNMPCKALD NPRGKHPETPFWTYAEFQTFIKSFDLHDYEELQRFTAIWL 216 

A GYI CK +G+ ET F E++ I ++ + E + A+++ 

Sbjct: 138 MAVHEGYIKRNFCKFAKINAKNKGRDIETKFLEVEEYERLI - - YETSKHPEYASYAALYI 195 

Query: 217 YYMTGTOVSEGLSLCWEDIDFDKKFLKVHTTLEKDENGNWYRKDOTKTPAGERIiIELDDI 276 
TG+R +E L L +DI D L V+ T + N + TKT + R I LDD 

Query: 277 TIEVLQVWRKNQFANQDTDFI ISRFGDPFCKSTICRI IKRKAQQVGVPVITGKGLRHSHA 336 

I + +Q D 1+ + T+ +1+ R+ + LRH++A 

Sbjct: 253 FINFI DQLPPTDDGRILPSLSNNAVNKTLRKIVGRE VKVHSLRHTYA 299 

Query: 337 S YLINVLKKD I LWARRMGHADKSTTLNTYSHWFNALDKTVSEE ITQNI KSAGLDS I 393 

SYLI D++ V++ +GH + + TIi Y+H E+I Q G +++ 

Sbjct: 300 SYLI-AHDIDLISVSQVLGHENI^ITLEVYAHQLQEQKSRNDEKIKQMWTECGRNAL 355 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6109> which encodes the amino acid 
sequence <SEQ ID 61 10>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5549 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 111/127 (87%) , Positives = 119/127 (93%) 

Query: 242 LKVHTTLEKDENGNWYRKDQTKTPAGERLIELDDITIEVLQVWRKNQFANQDTDFIISRF 301 

LKVHTTLEKDENGNWYRKDQTKTPAGERLIELDD+TI VL+ WR+NQ N DTDFIISRF 
Sbjct: 1 LKVHTTLEKDENGNWYRKDQTKTPAGERLIELDDVTIVVIjENWRRNQVVNTDTDFIISRF 60 

Query: 302 GDPFCKSTICRIIKRKAQQVGVPVITGKGLRHSHASYLINVLKKDILYVARRMGHADKST 361 

G+PFCKSTICR+IK KAQ +GVPVITGKGLRHS+ASYLINVLKKDILYVA+ MGHADKST 
Sbjct: 61 GEPFCKSTICRVIKHKAQSIGVPVITGKGLRHSYASYLINVLKKDILYVAKCMGHADKST 120 

Query: 362 TLNTYSH 368 

TLNTYSH 
Sbjct: 121 TLNTYSH 127 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1974 

A DNA sequence (GBSx2083) was identified in S.agalactiae <SEQ ID 6111> which encodes the amino 
5 acid sequence <SEQ ID 61 12>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm — - Certainty=0. 3299 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1975 

A DNA sequence (GBSx2084) was identified in S.agalactiae <SEQ ID 6113> which encodes the amino 
20 acid sequence <SEQ ID 61 14>. This protein is predicted to be repressor protein-related protein. Analysis of 
this protein sequence reveals the following: 
Possible site: 32 

>» Seems to. have no N-terminal signal sequence 

25 Final Results ----- 

bacterial cytoplasm Certainty=0 . 2721 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

30 A related GBS nucleic acid sequence <SEQ ID 9919> which encodes amino acid sequence <SEQ ID 9920> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98432 GB:L29324 repressor protein [Streptococcus pneumoniae] 
Identities = 38/65 (58%) , Positives = 52/55 (79%) , Gaps = 1/65 (1%) 

35 

Query: 2 MYRRLRDLREDNDFTQKYVAEK- LSFTHS AYS KI ERGER I LSADVI I KLSNLYNVSTDYL 60 

M +R+RDLRED+D TQ+YVA+ L+ T SAYSK+E G R++S D +IKL++ YNVS DYL 
Sbjct: 1 MLKRIRDLREDDDLTQEYVAKTlLNCTRSAYSKr-IESGTRIilSIDDLIKLADFYNVSLDYL 60 

40 Query: 61 LGQTD 65 

+G+ D 
Sbjct: 61 VGRVD 65 

There is also homology to SEQ ID 582. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 1976 

A DNA sequence (GBSx2085) was identified in S.agalactiae <SEQ ID 6115> which encodes the amino 
acid sequence <SEQ ID 6116>. This protein is predicted to be relaxase. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 31S0 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

= 5/417 (1%) 

MVITKHYAVHGKKYRRQLIKYILDPKKTRNI^LISDFGMSNYLDFPDYVELVKI'IYQNNFL 6 0 
MVITKH+A+HGK YR +LIKYIL+P KT+NL+L+SDFGM NYLDFP Y ELVKMY +NFL 
^ITKHFAIHGKNYRSKLIKYILNPSKTKNLTLVSDFGMRNYLDFPSYKELVKMYNDNFL 6 0 



ATHVD+ H HNHII+NSI+ S KK WDY E NL+M+SDR+SK+AGAKII RYSIIR 





1 


Sbj ct : 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbj Ct : 


180 




241 


Sbjct: 


240 




301 


Sbjct: 


300 


Query: 


361 


Sbjct: 


35S 



YEVYR++N+KYE+KQR++FL+E+S +F D +KA+ L++KIDF KH +FMTD NMKQ 



KL++++PY++ YF++ F +++I ILEFLL + + ++L+++A + GL++ K+K 



h E + ++Y+ +K RDAVHEFEVE+ QIE++V G+++KV GI ++ L 
3EEMVWKSYQDFKRNRI1AVHEFEVEI^-IMQIEEVVEEIGIYIKVQFGIDKKDL 412 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6117> which encodes the amino acid 
sequence <SEQ ID 6118>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3114 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 103/218 (47%) , Positives = 170/218 (77%) 

Query: 393 EEQIEKIVLDGLFVKVWMGIGQEGLIFIPNHQLNILEQENKKQYQVFIRETSSYFIYHKE 452 

E QIE+++ + ++4-KV + Q GLIFIPN+QL+I ++EN K+Y+V+IRET+ +FIY+KE 
Sbjct: 2 EHQIERLIAEDIYIKVSFSVKQSGLIFIPNYQLDIRKEENHKKYKVYIRETAQFFIYNKE 61 
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Query: 453 DSEMNRFMKGRDLIRQLTFDNKSLPYKRRISLVSLQQKIEEINLLMTLNIQNKSFLELKD 512 

SE+NR+M+G +LI QLT D+KS+P +RR ++ +L++KIEEI+LL+ L+ +NK + ++KD 
Sbjct: 62 ASELNRYMRGHELICQLTNDSKSIPKRRRC3TIDTLKKKIEEISLLIEI1DTENKPYQDIKD 121 

Query: 513 ELVGDIAQLDIELTNLQDKOTTLNKMAEW^ 572 

++V D+AQLD+ +T LQD LNK+AEV++NL +++ + ++LA+Y+ +KMNL+ + 1 
Sbjct: 122 DIVKEMAQLDLTITELQDHIAHLNKVAEVI^ 181 

Query: 573 QIESEIEMIQNQLDNKIEEYENAVRKLDEXVRVLHMDK 610 

++E EIE QN+L+ I+EYE VR+L+++ +L+ K 
Sbjct: 182 EVEKEIETSQNELNISIDEYEYLVRRLEKFGEILSDSK 21S 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1977 

A DNA sequence (GBSx2086) was identified in S.agalactiae <SEQ ID 6119> which encodes the amino 
acid sequence <SEQ ID 6120>. Analysis of this protein sequence reveals the following: 



d N-terminal signal £ 



Final Results 

bacterial cytoplasm Certainty=0 . 4006 (Af f ii 

bacterial membrane Certainty=0 . 0000 (Not Clear) < euco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: ! 

+R IRK+ + E +QI + M ++G + FS F R LL D Q +Q+E+W + QK 
Sbjct: 5 IRSIRKQFRLTETEEKQILDLMREKGDDNFSDFLRKSLLLSDGQ--KQMEKWFNLWKKQK 62 

Query: 65 VEQ1YRDTOEILVLAKLSQSVTMEHLEIILTCIKDLMKEIEVTIPLSYSFKDKYM 119 

+EQI RDVHE+ ++AK + VT EH+ I+LTCI++L+KK+E T PLS F +KYM 
Sbjct: 63 LEQISRDVHEWIIAKTNHQVTHEHVSILLTCIQELIKEVEKTGPLSEDFCNKYM 117 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1978 

A DNA sequence (GBSx2087) was identified in S.agalactiae <SEQ ID 6121> which encodes the amino 
acid sequence <SEQ ID 6122>. This protein is predicted to be TnpA. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2935 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC82523 GB:AF027768 TnpA [Serratia marcescens] 
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Sbjct- 


1 




85 


Sbj ct: 


60 




139 


Sbjct: 


119 






ibjl- 


179 






Sbjct: 


239 




319 


Sbjct: 






377 


Sbjct: 


358 



3 .= 175/413 (42%) , Positives = 243/413 (58%) , Gaps = 18/413 (4%) 

MMFKVEAVGPPERCPECGFD-KLYKHSSRNQLIMDLPIRLKRVGLHLNRRRYKCRECGST 84 
M F+V+ V P C EGG + + R+ DLPI KRV L + RRRY CR C +T 
MHFQVD-VPDPIACEECGVQGEFWFGKRDVPYRDLPIHGKRVTLWVTORRYTCRACKTT 5 9 



++FETP+ LGIDE+++ +R R +LTNIE RT+ D+ R ++ V 



+MDMW PY+ AV +LPQA++WDKFHVVRMAN AL+ VRK L+ + + RTL +R 



ILLKR H4+++RE +++TW GPL AYE KE FY IWD 



! IRQVERMGRGYSFDAL 376 
+ K+ + DLVRAV NW E YF D +TNAYTESIN + + R GRGYSF+ + 
IPKGQKEWSDLVRAVGNWREETMTYFETDMPVTNAYTESimiiAKDKNREGRGYSFEVM 357 



RA++L+ K HKK+ P 



No corresponding UNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1979 

A DNA sequence (GBSx2088) was identified in S.agalactiae <SEQ ID 6123> which encodes the amino 
acid sequence <SEQ ID 6124>. This protein is predicted to be mercuric reductase. Analysis of this protein 
sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2115 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT 



Query: 1 MNKFKVNISGMTCTGCEKHVESALEKIGAKNIESSYRRGEAVFELPDDIEVESAIKAIDE 60 

M K++V++ GMTCTGCE+HV ALE +GA IE +RRGEAVFELP+ + VE+A KAI + 
Sbjct: 1 MKCYRVDVQGMTCTGCEEHVAVALENMGATGIEVDFRRGEAVFELPNALGVETAKKAISD 60 

Query: 61 ANYQAGEIEEVSSLENVALINEDNYDLLIIGSGAAAFSSAIKAIEYGAKVGMIERGTVGG 120 

A YQ G+ EEV S E V L NE +YD +IIGSG AAFSSAI+A++YGAKV MIERGT+GG 
Sbjct: 61 AKYQPGKAEEVQSQEMVQLGNEGDYDYI I IGSGGAAFSSAIEAVKYGAKVAMIERGTIGG 120 

Query: 121 TCOTIGCVPSKTLLRAGEIlfflLSKDNPFIGLQTSAGEVDIiASLITQKDKLVSELRlJQKYM 180 

TCVNIGCVPSKTLLRAGEINHL+K+NPF+GL TSAGEVDLA LI QK++LV+ELRN KY+ 
Sbjct: 121 TCVNIGCTPSKTLLRAGEINHLAKNNPFVGLHTSAGEVDIAPLIKQKNELOTELRNSKYV 180 



WO 02/34771 



PCT/GB01/04789 



-2225- 



Query: 181 DLIDEYNFDLIKGEAKFVDASTVEVKGTKLSAKRFLIATGASPSLPQISGLEKMDYLTST 240 

DLID+Y F+LI +GEAKFVD TVEVNG +SAKRFLIATGASP+ P I GL ++DYLTST 
Sbjct: 181 DLIDDYGFELIEGEAKFVDEKWEVNGAPISAKRFLIATGASPAKPNIPGIjNEVDYLTST 240 

Query: 241 TLLELKKIPKRLTVIGSGYIGMELGQLFHHLGSEITLMQRSERLLKEYDPEISESVEKAL 300 

+LLELKK+PKRL VIGSGYIGMELGQLFH+IX3SE+TL+QRSERLLKEYDPEISESVEK+L 
Sbjct: 241 SLLELKKOTKRLWIGSGYIGMELGQLFHNLGSEVTLIQRSERLLKEYDPEISESVEKSL 300 

Query: 301 IEQGINLVKGATFERVEQSGEIKRVYITVNGSREVIESDQLLVATGRKPNTDSLWLSAAG 360 

+EQGINLVKGAT+ER+EQ+G+IK+V+V VNG + +IE+DQLLVATGR PNT +LNL AAG 
Sbjct: 301 TOQGINLVKGATYERIEQNGDIiaWHVEVNGKICRIIEADQLLVATGRTPNTATLNLRAAG 360 

Query: 361 WTGKlMEILINDFGQTSNEKIYAAGDVTLGPQFVYVAAYEGGIITDNAIGGIJilKKIDLS 420 

VE G EI+I+D+ +T+N +IYAAGDVTLGPQFVYVAAY+GG+ NAIGGLNKK++L 
Sbjct: 361 VEIGSRGEIIlDDYSRTTOTRIYAAGDVTLGPQFVYVAAYQGGVAAPNAIGGLlilKKLNLE 420 

Query: 421 WPAVTFTNPTVATVGLTEEQAKEKGYDVKTSVLPLDAVPRAIVNRETTGVFKLVADAET 480 

WP VTFT P +ATVGLTE+QAKE GY+VKTSVDPLDAVPRA+VNRETTGVFKLVAD++T 
Sbjct: 421 WPGVTFTAPAIATVGLTEQQAKENGYEVKTSVLPLDAVPRALVNRETTGVFKLVADSKT 480 

Query: 481 LIOTLGVHIVSENAGDVIYAASLAVKFGLTIEDLTETLAPYLTMAEGLKLVALTFDKDISK 540 

+KVLG H+V+ENAGDVI YAA+LAVKFGLT+ +D+ ETLAPYLTMAEGLKL ALTFDKDISK 
Sbjct: 481 MK^GAHWAENAGDVIYAATIAWFGLTVDDIRETLAPYLTMAEGLKLAALTFDKDISK 540 

Query: 541 LSCCAG 546 

LSCCAG 
Sbjct: 541 LSCCAG 546 

There is also homology to SEQ ID 1820. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or 



Example 1980 

A DNA sequence (GBSx2089) was identified in S.agalactiae <SEQ ID 6125> which encodes the amino 
35 acid sequence <SEQ ID 6126>. This protein is predicted to be regulatory protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 36 

»> Seems to have no H-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 .4529 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) c suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco - 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA83973 GB:AF138877 mercury resistance operon negative 
regulator MerRl [Bacillus sp. RC607] 
Identities = 83/129 (64%), Positives = 104/129 (80%) 



Query: 


1 


MIYRISEFADKCGVNKETIRYYERKNLLQEPHRTEAGYRIYSYDDVKRVGFIKRIQEFGF 6 0 






M +RI E ADKCGVNKETIRYYER L+ EP RTE GYR+YS V R+ FIKR+QE GF 




Sbjct: 


1 


MKFRIGELADKCGWKETIRYYERLGLIPEPERTEKGYRMYSQQTVDRLHFIKRMQELGF 


60 


Query: 


61 


SLSEIYKLLGVVDKDEVRCQDKFEFVSICICQI<EVQKQIEDLKRIETMLDDLKQRCPDEKKL 


120 






+L+EI KLLGWD+DE +C+DM++F K +++Q++IEDLKRIE ML DLK+RCP+ K + 




Sbjct: 


61 


TLNEIDKLLGVVDRDFAKCM3MYDFTILKIEDIQRKIEDLKRIERMLMDLKERCPENKDI 


120 


Query: 


121 


HSCPIIETL 129 








+ CPIIETL 




Sb j ct : 


121 


YECPIIETL 129 
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There is also homology to SEQ ID 1712. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 1981 

A DNA sequence (GBSx2090) was identified in S.agalactiae <SEQ ID 6127> which encodes the amino 
acid sequence <SEQ ID 6128>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

■»> Seems to have no N-terminal signal sequence 
10 INTEGRAL Likelihood = -7.86 Transmembrane 80 - 96 ( 78 - 100) 

Final Results 

bacterial membrane Certainty=0 .4142 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

A related GBS gene <SEQ ID 8943> and protein <SEQ ID 8944> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -13.52 
25 GvH: Signal Score (-7.5): -6.14 

Possible site: 44 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -7.86 threshold: 0.0 

INTEGRAL Likelihood = -7.86 Transmembrane 80 - 96 ( 78 - 100) 
30 PERIPHERAL Likelihood = 1.80 136 

modified ALOM score: 2.07 

*** Reasoning Step: 3 

35 Final Results 

bacterial membrane Certainty=0 .4142 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

ORF02021(439 - 666 of 1080) 

GP|451734|gb|AAA18975.l| |U05143 (9 - 46 of 46) envelope glycoprotein {Simian 
immunodeficiency virus} GP | 451744 jgb|AAA18980.l| ]U05148 envelope glycoprotein {Simian 
immunodeficiency virus} 
45 %Match =3.2 

%Identity =38.5 %Similarity =64.1 

Matches = 15 Mismatches = 13 Conservative Sub.s = 10 



RIPVQFKGCDDYYNFJWGYPLSRINLEHYLTEGGVLYFWYSroVSPT\T?YASLTPKVIKNVLPASDKKKRIKl 
=11 I « ll'|::||> 1= 
WGLTGNAGTTPTATTTTTTPRVVENVINESN 
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LFWMAIIAKLLILPYPALQTSYKSRPCLRRSSLRKLTQIPFSIOT 
ll»" =1 Ml 



5 



SEQ ID 8944 (GBS415) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 79 (lane 3; MW 21.2kDa). 

Example 1982 

A DNA sequence (GBSx2092) was identified in S.agalactiae <SEQ ID 6129> which encodes the amino 
acid sequence <SEQ ID 6130>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm --- Certainty=0 .3402 (Affirmative) 

bacterial membrane Certainty=0 .0000 (Not Clear) < ; 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < ; 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 1983 

A DNA sequence (GBSx2093) was identified in S.agalactiae <SEQ ID 6131> which encodes the amino 
acid sequence <SEQ ID 6132>. This protein is predicted to he ATPase. Analysis of this protein sequence 
reveals the following: 



) N-terminal signal sequence 

INTEGRAL Likelihood =-10.08 Transmembrane 324 - 340 ( 317 - 343 

Transmembrane 662 - 678 ( 660 - 690 

Transmembrane 350 - 366 ( 346 - 378 

94 - 110 ( 93 - 110 

681 - 697 ( 680 - 699) 

l.38 Transmembrane 148 - 164 ( 148 - 



INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood - 

INTEGRAL Likelihood = 



Final Results 

bacterial membrane Certainty=0 . 5034 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA22858 GB:M90750 cadmium-efflux ATPase [Bacillus firmus] 
Identities = 486/725 (67%), Positives = 584/725 (80%), Gaps = 18/725 (2%) 

Query: 1 MSRGKAKQSEKEMKAYRVC^FTCTNCAAIFENNVKELPGVQDAKVNFGASKVYVKGTTTI 60 

MS KA SE+EMKAYRVQGFTC NCA FE NVK+L GV+DAKVNFGASK+ V G TI 
Sbjct: 1 MSDQKAITSEQEMKAYRVQGFTCMCAGKFEKNVKQLSGVEDAKVNFGASKIAvYGNATI 60 

Query: 61 EELEKAGAFENLKIRDEKEQRVGGE PFWKQKENI KVYI SALLL WSWFL 109 

EELEKAGAFENLK+ EK R + PF+K K + +Y S LL+ + 

Sbjct: 61 EELEKAGAFENLKVTPEKSARQASQEVKEDTKEDKVPFYK- KHSTLLYAS - LLITFGYLS 118 



Query: 110 GEQYGEEHVLPTIGYARSILIGGYSLFIKGLKNLRRLNFDMNTLMTIAIIGAAIIGEWGE 169 
GEE+++ T+ + AS+ IGG SLF GL+NL R FDM TLMT+A+IG AIIGEW E 
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' Sbjct: US SYWGEENIOTTLLFIASMFIGGLSLFKVGLQiniLRFEFDMKTLMTVAVIGGAIIGEWAE 178 

Query: 170 GATWILFAISEALERYSMDKARQSIESLMDIAPKEALIRRGNEEMMIHVDEIQVGDIMI 229 

A WILFAISFALER4SMD+ARQSI SLMDIAPKEAL++R +E+MIHVD+I VGDIMI 
Sbjct: 179 VAIWILFAISEALERFSMDRARQSIRSLMDIAPKEALVKRNGQEIMIHVDDIAVGDIMI 238 

Query: 23 0 VKPGQKIAMDGIWKGTSTMQAAITGBSTOVTKITTTOEVFAGTIJWEEGLLEVKVTKRVE 289 
VKPGQK+AMDG+W G S +NQ AITGESVPV K ++EVFAGTLNEEGLLEV++TK VE 

. Query: 290 DTTLSKIIHLVEEAQAERAPSQAFVDKFAKYYTPAIVILALLIAWPPL-FGGDWSQWIY 348 
DTT+SKIIHLVEEAQ EPAPSQAFVDKFAKYYTP I+I+A L+A+VPPL F G W WIY 
Sbjct: 299 DTTISKIIHLVEEAQGERAPSQAFVDKFAKYYTPIIMIIATLVAIVPPLFFDGSWETWIY 358 

Query: 349 QGIAVLWGCPCALWSTPVAVVTAIGNAAIQIGVLIKGGIHLEAAGHLKAIAFDKTGTLT 406 

QGLAVLWGCPCALV+STP+++V+AIGNAAK GVL+KGG++LE G LKAIAFDKTGTLT 
Sbjct: 359 QGLAVLWGCPCALVI STP I S IVSAI GNAAKKGVLVKGGVYLEEMGALKAI AFDKTGTLT 418 

Query: 409 KGIPAVTD- - IVTYGRNENEliITITSAIEKGSQHPLASAIMRKAEENGLKFNEVTVEDFQ 466 

KG+PAVTD ++ NE EL++I +A+E SQHPIASAIM+KAEE + +++V VEDF 
Sbjct: 419 KGVPAVTDYNVLNKQINEKELLSIITALEYRSQHPLASAIMKKAEEENITYSDVQVEDFS 478 

Query: 467 S ITGKGVKAKINNEMYYVGSQNLFEE - LHGS ISSDKKEKIADMQTQGKTVMVLGTEKEIL 525 

SITGKG+K +N YY+GS LF+E L D ++ + +Q QGKT M++GTEKEIL 

Sbjct: 479 SITGKGIKGIWGTTYYIGSPKLFKELLTNDFDKDLEQNVTTLQNQGKTAMIIGTEKEIL 538 

Query: 526 SFIAVADEMRESSKEVIGKLNNMGI-ETVMLTGDNQRTATAIGKQVGVSDIKADLLPEDK 584 

+ IAVADE+RESSKE+4 KL+ +GI +T+MLTGDN+ TA AIG QVGVSDI+A+L+P+DK 
Sbjct: 539 AVIAVADEVRESSKEILQKLHQLGIKKTIMLTGDNKGTANAIGGQVGVSDIEAELMPQDK 598 

Query: 585 LNFIKELREKHQSVGOTGDGVHDAPALAASTVGVAMGGAGTDTALETADIALMSDDLSKL 644 

L+FIK+LR ++ +V MVGDGVNDAPALAASTVG+AMGGAGTDTALETAD+ALM DDL KL 
Sbjct: 599 LDFIKQLRSEYGWMWGDGTODAPALAASTVGIAMGGAGTDTALETADVALMGDDLRKL 658 

Query: S45 PYTIKLSRKALAIIKQNITFSIAIKLVALLLVMPGWLTLWIAIFADMGATLLVTLNSLRL 704 

P T+KLSRK L I IK NITF++AIK +A LLV+PGWLTLW1AI +DMGATLLV LN LRL 
Sbjct: 659 PSTVKLSRKTLNIIKANITFAIAIKFIASLLVIPGWLTLWIAILSDMGATLLVALNGLRL 718 

Query: 705 LKIKE 709 

Sbjct: 719 MRVKE 723 

There is also homology to SEQ ID 3506. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1984 

A DNA sequence (GBSx2094) was identified in S.agalactiae <SEQ ID 6133> which encodes the amino 
acid sequence <SEQ ID 6134>. Analysis of this protein sequence reveals the following: 

) N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0 . 0779 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1985 

A DNA sequence (GBSx2095) was identified in S.agalactiae <SEQ ID 6135> which encodes the amino 
acid sequence <SEQ ID 6136>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.92 Transmembrane 123 - 139 ( 115 - 145) 
Likelihood = -6.74 Transmembrane 172 - 188 ( 1S7 - 190) 
Likelihood = -1.81 Transmembrane 80 - 96 ( 80 - 96) 



Final Results 

bacterial membrane Certainty=0 .4567 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9923> which encodes amino acid sequence <SEQ ID 9924> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 4216. 

A related GBS gene <SEQ ID 8945> and protein <SEQ ID 8946> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: -6.41 

GvH: Signal Score (-7.5): -2.23 
Possible site: 58 

>>> Seems to have no N-terminal signal sequence 

ALOM program count: 3 value: -8.92 threshold: 0.0 

INTEGRAL Likelihood = -8.92 Transmembrane 123 - 139 ( 115 - 145) 
INTEGRAL Likelihood = -6.74 Transmembrane 172 - 188 ( 157 - 190) 
INTEGRAL Likelihood = -1.81 Transmembrane 80 - 96 ( 80 - 96) 
PERIPHERAL Likelihood =2.92 46 
modified ALOM score : 2.28 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 .4567 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1986 

A DNA sequence (GBSx2096) was identified in S.agalactiae <SEQ ID 6137> which encodes the amino 
acid sequence <SEQ ID 6138>. This protein is predicted to be histidine rich P type ATPase (HRA-1) 
(copB). Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.37 Transmembrane 318 - 334 ( 307 - 345) 
INTEGRAL Likelihood = -5.84 Transmembrane 347 - 363 ( 335 - 364) 
INTEGRAL Likelihood = -5.15 Transmembrane 88 - 104 ( 86 - 112) 
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INTEGRAL Likelihood = -5.04 Transmembrane 651 - 667 ( 649 - 669) 

INTEGRAL Likelihood - -4.30 Transmembrane 156 - 172 ( 155 - 173) 

INTEGRAL Likelihood = -4.30 Transmembrane 669 - 685 ( 668 - 690) 

Likelihood = -3.03 Transmembrane 62 - 78 ( 60 - 80) 



Final Results 

bacterial membrane. Certainty=0. 6349 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA62113 GB:U16658 hietidine rich P type ATPase [Escherichia 
coli] 

Identities = 598/731 (81%) , Positives = 651/731 (88%) , Gaps = 36/731 (4%) 

Sbjct: 1 MRNNKQHSE 

Query: 37 NMDHSEMDHGAMGGHAHHHHGSFKEIFLKSLPLGIAILLITPMMDIQL 84 

MD+SEMDHGAMGGHAHHHHGSFK+IFLKSLPLGIAILLITP+M IQL 
Sbjct: 61 HNEMKHSQMDHSKMDYSEMDHGAMGGHAHHHHGSFKDIFLKSLPLGIAILLITPLMGIQL 120 

Query: 85 PFQIIFPYADWAAVLATILYIFGGKPFYMGAKDEFNSKAPGMMSLITLGITVSYAYSVY 144 

PFQIIFPYADWAAVLATILYIFGGKPF MGAKDEFNSK PGMMSLITLGITVSYAYSVY 
Sbjct: 121 PFQIIFPYADWAAVLATILYIFGGKPFLMGAXDEFNSKVPGMMSLITLGITVSYAYSVY 180 

Query: 145 AVA^YVTGEHVMDFFFEFTTLILIMLLGHWIEMKALGEAGDAQKALAELVPKDAHVVLE 204 

AVAARYVTGE VMDFFFEFTTLILIMLLGHWIEMKALGEAG+AQKALAELVPKDAHVVLE , 
Sbjct: 181 AVAARYVTGEPVMDFFFEFTTLILIMIiIKSHWIE^ 240 

Query: 205 DDSIETRPVSELQIGDVIRVQAGENVPADGIIIRGESRVNEALVTGESKPIEKKTGDEVI 264 

DDS IETRPV+ +LQ+GD+ I RVQAGENVPADG I RGESRVNEALVTGESKPIEK GDEVI 
Sbjct: 241 DDSIETRPYADLQVGDL1RVQAGENVPADGTIQRGESRA/1CEALVTGESKPIEKNPGDEVI 300 

Query: 265 GGSTNGGGVLYVEIKQTGDQSFISQVQTLISQAQSQPSRAENVAQKVASWLFYIAVWAL 324 

GGSTNG GVLYVEIKQTGD+SFISQVQTLISQAQSQPSRAEN+AQKVA WLFYIAV+ AL 
Sbjct: 301 GGSTNGDGVLYVEIKQTGDKSFISQVQTLISQAQSQPSRAENLAQKVAGWLFYIAVIAAL 360 

Query: 325 IALLIiWIIADLPTAVIFTVTALVIACPHALGLAIPLWSRSTSLGASRGLLVItNREALE 384 

IAL+IW + IAD+PTAVI FTVT LVIACPHALGLAIPLV +RSTSLGASRGLLVK+R+ALE 
Sbjct: 361 IALVIWWIADVPTAVIFTVTTLVIACPHALGIAIPLVTARSTSLGASRGLLVKDRDALE 42 0 

Query: 385 LTTKADVIWLDKTGTLTTGEFKAn^VTVLSDKYSEEEITGLLAGIEAGSSHPIAQSIVNH 444 

LTT ADVMVLDKTGTLTTGEFKVLDV + +DKY+++EI LL+GIE GSSHPIAQSI+++ 
Sbjct: 421 LTTNADVMVLDKTGTLTTGEFKVLDVELFNDKYTKDEIVALLSGIEGGSSHPIAQSIISY 480 

Query: 445 AEAKGI KSVS FDS I E I VSGAGIEGEANGHH YQL I SQKAYGKALRMD I PKGATLS I LVENN 504 

AE +GI + VS FDS I + 4 + SGAG+EG+ANGH YQLISQKAYG+ L MDIPKGAT+S+LVEN+ 
Sbjct: 481 AEQ^GIRPVSFDSIDVMSGAGVEGQANGHRYQLISQKAYGRNLDMDIPKGATISVLVEND 540 

Query: 505 EAIGAVALGDELKETSRI^IEVLKKYGIEPLNATGDNEEAAQGVAEVLGIQYQANQSPED 564 

EAIGAVALGDELK TS++LI+ LKK I+P+MATGDNE+AAQG AE+LGI Y ANQSP+D 
Sbjct: 541 EAIGAVALGDELKPTSKDLIQALKKNKIQPIMATGDNEKAAQGAAEILGIDYLANQSPQD 600 

Query: 565 K^KLVESMKNQNKTVIMVGDGVNDAPSI^ALADVGIAIGAGTQVALDSADIILTQSDPGDI 624 

KY+LVE +K + K VIMVGDGVNDAPSLALADVGIAIGAGTQVALDSADIILTQ PGDI 
Sbjct: 601 KYELVEKLKAEGKWIMVGDGVNDAPSLALADVGIAIGAGTQVAt.DSADIILTQYSPGDI 660 

Query: 625 ESFIEIANKTTRKMKQNLWGAGYNFIAIPIAAGLLAPIGITLGPAFGAVLMSLSTVIVA 684 

SFIELA KTTRKMK+NLWGAGYNFIAIPIAAG+LAPIGITL PA AVLMSLSTVIVA 
Sbjct: 661 ASFIEIAQKTTRKMKENLWGAGVNFIAIPIAAGILAPIGITLSPAVAAVLMSLSTVIVA 720 

Query: 685 INAMTLKLEPK 695 

INAMTLKLEPK 
Sbjct: 721 INAMTLKLEPK 731 
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There is also homology to SEQ ID 3506. 

A related GBS gene <SEQ ID 8947> and protein <SEQ ED 8948> were also identified. Analysis of tl 
protein sequence reveals the. following: 

Lipop: Possible site: -1 Cre'nd: 7 
McG: Discrim Score: -19.12 
GvH: Signal Score (-7.5): -3.71 

Possible site: 27 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 7 value: -13.37 threshold: 0.0 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



modified ALOM 



Likelihood =-13 
Likelihood = -5 
Likelihood = -5 
Likelihood = -5 
Likelihood = -4 
Likelihood = -4 
Likelihood = -3 
Likelihood = 0 
score: 3.17 



37 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



129 - 145 



* Reasoning Step: 3 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0. 6349 (Affirmative) 
•- Certainty=0. 0000 (Not Clear) < : 
•- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the 

ORF02015(220 - 2304 of 2604) 

EGAD 1 37454 1 38974(1 - 731 of 731) histidine rich P type ATPase (HRA-1) {Escherichia coli} 
GP|643613|gb|AAA62113.l| |U16658 histidine rich P type ATPase {Escherichia coli} 
PIR| JC2464| JC2464 probable copper-transporting ATPase (EC 3.6.1.-) HRA-1 
Enterobacteriaceae spp. 
. %Match =67.4 
%ldentity =85.9 %Similarity =93.7 

Matches = 598 Mismatches = 43 Conservative Sub.s = 54 



162 



1S2 



222 



252 



--DHSKHDHNEMEHSQMDHSNMDHSEMDHGAMG3HAHHHHGSPKEIFLKSLPLGIAILLITPMMDIQLPFQIIFPYADV 
1111111111 = 1111111 I|:|1II1IIII1MIIIIIII|:|IIII1II1IIIIIII|:| I I I I I I I I I I I I I I 



534 564 594 624 654 684 714 744 

50 VAAvIATILYIFGGKPFYMGAKDEFNSKAPGMMSLITLGITVSYAYSVYAVAARYVTGEHVMDFFFEFTTLILIMLLGHW 

iiiiiiiiiiiiiiiii minim iiimimiiMiimmiiiiiiii iimiiiiiiiiiiiiiiii 

VAAVLATILYIFGGKPFLMGAKDEFNSKVPGMMSLITLGITVSYAYSVYAVAARYVTGEPVMDFFFEFTTLILIMLLGHW 
150 160 170 180 190 200 210 

55 774 804 834 364 894 924 954 984 

IEMKALGFAGDAQKALAELVPKDAHVVLEDDSIETRPVSELQIGDVIRVQAGFJSIVPADGIIIRGESRVNFALVTGESKPI 

IIIIMIII|:||||||||||||||||||||lllllll-ihll = lllllilllllll I : I I I I 

IEMKALGEIAGNAQKAIAELVPKDAIT^EDDSIETRPvADLQ 

230 240 250 260 270 280 290 

60 

1014 1044 1074 1104 1134 1164 1194 1224 

EKKTGDEVIGGSTNGGGVLYVEIKQTGDQSFISQVQTLISQAQSQPSFAEN^/AQKVASWLFYIAVWALIALLIWTIIAD 

ii miimiii imimmmmimiimiiiiiiimiim mini: 1111101 : m 

EKNPGDEVIGGSTNGDGVLYVEIKQTGDKSFISQVQTLISQAQSQPSR 
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310 320' 330 340 350 360 370 



1254 1284 1314 1344 1374 1404 1434 1464 

LPTAVI FTVTALVIACPHALGLAI PLWSRSTSLGASRGLLVKIjIREALELTTKADVMVLDKTGTLTTGEFKVIjDVTVLSD 
=11)111111 IIIIIIIIIIIIIIII =llllllllllllll=l=llllll I llllll IIIM III M :::| 
VPTAVIFTVTTLVIACPHALGIAIPLVTARSTSLGASRGL^^ 

390 400 410 420 430 440 450 



1494 1524 1554 1584 1614 1644 1674 1704 

10 KySEEEITGLI^GIEAGSSHPIAQSIVNHAEAKGIKSVSET)3IEIVSGAGIEGH^HGHtiYQLISQKAYGKALRMDIPKGA 
ll===ll 11 = 111 I I I I I I I I I I = = = I I =11= I I I I I I = = = I I I I = I I = I I I I 1111111111= I III 
KyTKDEIVALLSGIEGGSSHPIAQSIISYAEQQGIRPVSFDSIDVMSGAGVEGQANGHRYQLISQKAYGRNLDMDIPKGA 
470 480 490 500 510 520 530 

15 1734 1764 1794 1824 1854 1884 1914 1944 

TLSILVENNFAIGAVALGDELKETSRNLIEVLKKYGIEPU-1^^ 

:hl Ihll.llllllll ||::||: III I : I : I I I I I I I : I I I I 11 = 111 I I I I I I : I I I : I I I =1 = 
TISVLvElTOFJVIGAVALGDELKPTSKDLIQAL^^ 

550 560 570 580 590 600 610 

20 

1974 2004 2034 2064 2094 2124 2154 2184 

NKTVIWGDGV^APSLALADVGIAIGAGTQVALDSADIILTQSDPGDIESFIELANKTTRKMKQl^WGAGYNFIAIP^ 

i mi: ni M' ' : iiii i iii ii ii iii. mi nun iniiihiiiiiiiiiiiiin 

GKCTI^GDGVNDAPSIJU^ADVGIAIGAGTQVALDSADIILTQYSPGDIASFIEIiAQKTTRKMKENLWGAGYlIFIAIPI 
25 630 640 650 660 670 680 690 

2214 2244 2274 2304 2334 2364 2394 2424 

AAGLIAPIGITLGPAFGAVLMSLSTVIVAINAMTLKLEPK*NEAGTKKHWLV*PPSRIGSDQLVCCIRKIIDR*IFDKNR 
111=11111111 II I I lllllllllllllll II 
30 AAGILAPIGITLSPAVAAVLMSLSTVIVAINAMTLKLEPK 
710 720 730 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 1987 

A DNA sequence (GBSx2097) was identified in S.agalactiae <SEQ ID 6139> which encodes the amino 
acid sequence <SEQ ID 6140>. This protein is predicted to be CopA. Analysis of this protein sequence 
reveals the following: 

Possible site: 59 
40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2197 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA40599 GB:X57326 ORF-1 [Thiobacillus f errooxidans] 
Identities = 26/65 (40%) , Positives = 40/65 (61%) , Gaps = 2/65 (3%) 

Query: 1 MKQEILL- -DGWCAGCANTVQERFSAIEGVESVEVDLATKKAVLESQTEIDTETLNAAL 58 

M Q+I L G+ CA CA++V++ I G++S +V LAT +A + Q+ I TE L AA4 
Sbjct: 1 MSQKIFLRITGMTCAHCAHSVEKALLGIHGIDSAQVSIATNOAEVFLQSSIPTEALLAAV 60 

Query: 59 AETNY 63 
+ Y 

Sbjct: 61 TQAGY 65 



There is also homology to SEQ ID 3510. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1988 

A DNA sequence (GBSx2098) was identified in S.agalactiae <SEQ ID 6141> which encodes the amino 
acid sequence <SEQ ID 6142>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3220 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1989 

A DNA sequence (GBSx2099) was identified in S.agalactiae <SEQ ID 6143> which encodes the amino 
acid sequence <SEQ ID 6144>. This protein is predicted to be heavy-metal transporting P-type ATPase 
(b0484). Analysis of this protein sequence reveals the following: 
Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.09 Transmembrane 131 - 147 ( 130 - 150) 

Final Results 

bacterial membrane Certainty=0 . 2635 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB01764 GB:U42410 heavy-metal transporting P-type ATPase 
[Proteus mirabilis] 
Identities = 98/153 (64%) , Positives = 123/153 (80%) 

^ +A+KAL G++V MITGDNK TAKAIAKQ+GID +++EVLP+ K +K+L + G KVA 

Sbjct: 649 KAIKALHALGLKVAMITGDNKATAKAIAKQLGIDEIVAEVLPDGKVAALKQLSQKGDKVA 708 

Query: 62 MVGDGINDAPAIACANVGIAVGSGTDVAIESADIVLMRNDLTAVLTTIDLSHATLRNIKQ 121 

VGDGINDAPALAQA+VG+A+G+GTDVAIE+AD+VLM DL V+ I LS AT+RNIKQ 
Sbjct: 709 FVGDGINDAPALAQADVGLAIGTGTDVAIEAADVVLMSGDLRGVVDAIALSQATIRNIKQ 768 

Query: 122 NLFWAFAYNLVGIPVAMGLLYIFGGLLMSPMLA 154 

NLFW FAYN + IPVA G+LY G+L+SP+ A 
Sbjct: 769 NLFWTFAYNALLIPVAAGMLYPINGMLLSPIFA 801 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3505> which encodes the amino acid 
sequence <SEQ ID 3506>. Analysis of this protein sequence reveals the following: 



Possible site: 36 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.83 Transmembrane 328 - 344 ( 314 - 
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INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



■ 370 ( 347 - 

- 117 ( 100 - 

• 181 ( 165 - 
■681 ( 662 - 

- 83 ( 66 - 

• 507 { 490 - 

• 707 ( 691 - 

■ 156 ( 139 - 



- Final Results - 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0. 5331 (Affirmative) . 
-- Certainty=0 . 0000 (Not Clear) < : 
-- Certainty=0. 0000 (Not Clear) < i 



15 An alignment of the GAS and GBS proteins is shown below. 

Identities = 92/152 (60%), Positives = 123/152 (80%) 

Query: 4 VKALRRRGTOVIMITGDNKRTAICAIAKQVGIDSVLSEVLPEDICAEEVKKLQEAGKKVAMV 63 
V+AL + G+ IM+TGD+ TAKAIA QVGI V+S+VLP+ KA + L+ G+KVAMV 
20 Sbjct: 544 VEALHQLGIHTIMLTGDHDATAKAIASQVGITDVISQVLPDQKAGVIADLRSQGRKVAMV 603 

Query: 64 GDGINDAPALAQANVGIAVGSGTDVAIESADIVLMRNDLTAVLTTIDLSHATLRNIKQNL 123 

GDGINDAPALA A++GIA+GSGTD+AIESAD++LM+ D+ ++ + LS T+R +K+NL 
Sbjct: 604 GDGINDAPALAVADIGIAMGSGTDIAIESADVILMKPDMLDLVKAMSLSRVTMRIVKENL 663 

25 

Query: 124 FWAFAYNLVGI PVAMGLLYI FGGLLMSPMLAG 155 

FWAF YN++ IPVAMGLL++FGG L++PMLAG 
Sbjct: 664 FWAFI YNVLMI PVAMGLLHLFGGPLLNPMLAG 695 



30 Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or d 



Example 1990 

A DNA sequence (GBSx2100) was identified in S.agalactiae <SEQ ID 6145> which encodes the amino 
acid sequence <SEQ ID 6146>. This protein is predicted to be CopY. Analysis of this protein sequence 
reveals the following: 



tf-terminal signal sequence 



• Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



- Certainty=0. 2067 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < I 

- Certainty=0 . 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 8 TSITDAEWEVMRVWANDLVTSKTVISVLKEKIvlDWTESTIKTILGRLVEKGVLNTEQEGR 67 

TSI++AEWEVMRWWA + +S +I++L W+ STIKT++ RL EKG L ++++GR 

Sbjct: 2 TSISNAEWEVMRVWAKQMTSSSEIIAILSRTYa-ISASTIKILITRLSEKGYLTSQRQGR 61 

Query: 68 KFIYTAMIVEKFAVRDFAEDIFNRICKiaOTGlWIGSIIEDHvLSFDDIDRLEKILEIKKS 127 

K+IY++ I E+EA+ ++F+RIC K +1 ++E+ ++ DI++LE +L KK+ 

Sbjct: 62 KYIYSSLISEEFALEQQVSEVFSRICOTKHQALIRHLVEETPMTLSDIEKLFALLLSKKA 121 

Query: 128 FAVEEVDCQCTEGQCDCHE 146 

AV EV C C GQC C+E 
Sbjct: 122 NAVPEVKCNCIVGQCSCYE 140 



60 , There is also homology to SEQ ID 3502. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1991 

A DNA sequence (GBSx2101) was identified in S.agalactiae <SEQ ID 6147> which encodes the amino 
acid sequence <SEQ ID 6148>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 2829 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful 
vaccines or diagnostics. 

Example 1992 

A DNA sequence (GBSx2102) was identified in S.agalactiae <SEQ ID 6149> which encodes the amino 
acid sequence <SEQ ID 6150>. This protein is predicted to be DS RF protein. Analysis of this protein 
sequence reveals the following: 
Possible site: 57 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-13.21 Transmembrane 142 - 158 ( 136 - 1S9) 
INTEGRAL Likelihood = -3.45 Transmembrane 70- 86 ( 66- 88) 
INTEGRAL Likelihood = -3.13 Transmembrane 178 - 194 ( 176 - 195) 

Final Results 

bacterial membrane Certainty=0 . 6286 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



+P+ W++GLLGL+P+YLG++ I GE E+E+E 14- 



D+ IYIPYF +L S+ +V 4-VF 1 + + C 4-S4- L4-S ISETIEKY+R IVP+ 







Sbjct: 


18 




63 , 


Sbjct: 


77 


Query: 


123 


Sbjct: 


13S 


Query: 


183 


Sbjct: 


196 



VFI LG+YI++E+GT 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6151> which encodes the amino acid 
sequence <SEQ ID 61 52>. Analysis of this protein sequence reveals the following: ' 



5 
10 



Possible 
> Seems to r 


site: 34 

ave an uncleavable N- 


term signal seq 










INTEGRAL 


Likelihood =-13.16 


Transmembrane 


143 




135 


165 


INTEGRAL 


Likelihood = -9.13 


Transmembrane 


49 


65 


43 


71 


INTEGRAL 


Likelihood = -7.17 


Transmembrane 


73 


89 


72 


94 


INTEGRAL 


Likelihood = -6.00 


Transmembrane 


13 


29 


9 


33 


INTEGRAL 


Likelihood = -2.71 


Transmembrane 


180 


196 


179 


197 


INTEGRAL 


Likelihood = -0.59 


Transmembrane 


112 


128 


109 


128 



Final Results 

bacterial membrane Certainty=0.6265(Affirmative) < suco 

bacterial outside Cartainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAF42284 GB:AE002544 cadmium resistance protein. [Neisseria 
meningitidis MC58] 
Identities = 201/208 (96%) , Positives = 205/208 (97%) 

Query: 1 MRCFMIQNWTSIILYSGTAVDBLIILMLFFAKRKSRKDIINIYLGQFLGSVSLILLSLL 60 

MRCFMIQNWTSIILySGTAVDLLIILMLFFAKRKSRKDIINIYLGQFLGSVSLILLGLL 
Sbjct: 1 MRCFMIQNWTS I ILYSGTAVDBLI ILMLFFAKRKSRKDI INIYLGQFLGSVSLILLSLL 60 

Query: 61 FAFVLDYIPSKE I LGLLGLI PI FLGLKVLLLGDSDGEAIAKEGLSKDNKNLI FLVAMITF 120 

FAFVLDYIPSKEILGLLGLIPI LG+KVLLLGDSDGEAIAK3GL KDNKNLIFLVAMITF 
Sbjct: 61 FAFVI£)YIPSKEILGLLGLIPILI^IKVLLLGDSDGEAIAKEGLRKDWKNLIFI,VAMITF 120 

Query: 121 ASCGADNIGVFVPYFTTLMLANLIVAIiTFLVMIYLLVFSAQKLAQVPSVGETLEKYSRW 180 

ASCSADNIGVFVPYFTTLNLflNLIVALLTFLVMIYLLVFSAQKLAQVPSVGETLEKYSRW 
Sbjct: 121 ASCGADNIGVFVPYFTTLT^ANLIVALLTFLVMIYLLVFSAQKIAQVPSVGETLEKYSRW 180 

Query: 181 FIAWYLGLGMYILIENNSFDMLWAVLG 208 

F+AWYLGLG+YIL+ENNSFDMLW VLG 
Sbjct: 181 FVAV/YLGLGIYILVENNSFDMLWTVLG 208 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 71/200 (35%) , Positives = 130/200 (64%) , Gaps = 4/200 (2%) 

Query: 1 MGQTIISAIGvYISTSIDYLIVLIILFAQLSQMKQKWHIYAGQYLGTGLLVGASLVAAYV 60 

M Q +Y T++D LI+L++ FA+ K +IY GQ+LG+ L+ SL+ A+V 

Sbjct: 5 MIQKVVTSIILYSGTAVDLLIILMLFFAKRKSRKDIINIYLGQFLGSVSLILLSLLFAFV 64 

Query: 61 VNFVPEAl#WGLLGLIPIYLGIRFAIVGEGEEE3EEEIIERLEQSKANQLFWTVTLLTIA 120 

++++P ++GLLGLIPI+LG++ ++G+ + E +EL+ N+FV ++T A 
Sbjct: 65 LDYIPSKEILGLLGLIPIFLGLKVLLLGDSDGEAIAK--EGLSKDNKNLIF-LVAMITFA 121 

Query: 121 S-GGDNLGIYIPYFASLDWSQTLWLLVFAIGIIIFCELSWVLSSIPLlSETIEICfQRII 179 

S G DN+G+++PYF +L+ + +V LL F + I + + L+ 4-P + ET+EKY R 
Sbjct: 122 SCGADNIGVWPYFTTIJSn^LIVALLTFLVMIYLLWSAQKLAQVPSVGETLEKYSRWF 181 

Query: 180 VPLVFIPLGLYIMYESGTIE 199 

+ +V++ LG+YI+ E+ + + 
Sbjct: 182 IAWYLGLGMYILIENNSFD 201 

SEQ ID 6150 (GBS174) was expressed in and purified from E.coli. The purified protein is shown in lane 7 
of Figure 223. 

Based on mis analysis, it was predicted that mis protein and its epitopes, could be useful antigens for 
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Example 1993 

A DNA sequence (GBSx2103) was identified in S.agalactiae <SEQ ID 6153> which encodes the amino 
acid sequence <SEQ ID 6154>. This protein is predicted to be Pgm. Analysis of this protein sequence 
reveals the following: 

5 Possible site: 53 

>» Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4324 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

sGP:CAB96418 GB:AJ243290 phosphoglucomutase [Streptococcus thermophilus] 
15 Identities = 65/76 (85%) , Positives = 71/76 (92%) 

Query: 1 MTYTENLQKWLDFEQLPDYLRQELLSMDEKTKEDAFYTNLEFGTAGMRGYIGAGTNRINI 60 

M+YTEN QKWLDF +LP YLR EL+SMDEKTKEDAFYTNLEFGTAGMRG IGAGTNRINI 
Sbjct: 1 MSYTENYQKWLDFAELPAYLRDELVSMDEKTICEDAFYTNLEFGTAGMRGLIGAGTNRINI 60 

20 

Query: 61 YWRQATEGLAKLIET 76 

YWRQATEGLA+LI++ 
Sbjct: 61 YWRQATEGLAQLIDS 76 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 6155> which encodes the amino acid 
sequence <SEQ ID 6156>. Analysis of this protein sequence reveals the following: 
Possible site: 53 

>» Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 .4324 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 75/76 (98%) , Positives = 75/76 (98%) 

Query: 1 MTYTENLQKWLDFEQLPDYLRQELLSMDEKTKEDAFYTNLEFGTAGMRGYIGAGTNRINI 60 
MTYTEN QICWLDFEQLPDYLRQELLSMDEKTIQ2DAFYTNLEFGTAGMRGYIGAGTNRINI 
40 Sbjct: 1 MTYTENFQKWLDFEQLPDYLRQELLSKDEKTKEDAFYTKLEFGTAGMRGYIGAGTNRINI 60 

Query: 61 YWRQATEGLAKLIET 76 

YWRQATEGLAKLIET 
Sbjct: 61 YWRQATEGLAKLIET 76 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1994 

A DNA sequence (GBSx2104) was identified in S.agalactiae <SEQ ID 6157> which encodes the amino 
50 acid sequence <SEQ ID 6158>. This protein is predicted to be a membrane protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 53 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.21 Transmembrane 94 - 110 ( 93 - 115) 
55 INTEGRAL Likelihood = -4.14 Transmembrane 172 - 188 ( 166 - 188) 
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130 - 146 ( 129 - 149) 
Transmembrane 62 - 78 ( 62 - 79) 

Final Results 

bacterial membrane Certainty=0 .3484 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA80247 GB:Z22520 membrane protein [Bacillus acidopullulyticus] 
Identities = 47/185 (25%) , Positives = 80/185 (42%) , Gaps = 23/185 (12%) 

Query: 1 MKKKNKSSNIAIIAIFFAIMLVIHFLSSFIFSFWLVPIKPTLMHIPVIIASIAYGPRIGA 60 

MKK +11 + A+ 4+4- T4MHIP II I GP +G 

Sbjct: 1 MKKSLTVRDIVIAGVLGAVAILLGVTRLGYIPVPTAAGNATIMHIPAIIGGIMQGPWGL 60 

Query: 61 TLGALMGGIS VANSS I VLLPTSYLFSPFVENGNFYSLI IALVPRILIGI I PYFVYKLLHN 120 

4GA4 G S N444 L F +++++PR+ IG++ + VY + 

Sbjct: 61 IVGAI FGI SSFLNATVPL FKDPLVSILPRLFIGWAWLVYIGIRR 105 

Query: 121 R FGlAISGAIGSLTNTVFVXiSGIFIFFSSTYNGNIKLMIAGIISSKSLAEMVIAAII 177 

+ + +S IG4LTNT VL4 F 4 +A 4N L E V4 14 
Sbjct: 106 KSEWAVGLSAFIGTLTNTALVLA- -MAOTRHYLTAGVAWTVA ITNGLPEAWGTI V 160 

Query: 178 VYLTV 182 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6159> which encodes the amino acid 

sequence <SEQ ID 6160>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -8.97 Transmembrane 18 - 
INTEGRAL Likelihood = -7 
INTEGRAL Likelihood = -5 
INTEGRAL Likelihood = -4 
INTEGRAL Likelihood = -3 
INTEGRAL Likelihood = -0 

Final Results 

bacterial membrane Certainty=0 .4588 (Affirmative) < succ; 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



Transmembrane 170 - : 

Transmembrane 96 - : 

Transmembrane 140 - : 

Transmembrane 64 - 



45 The protein has homology with the following sequences in the databases: 

>GP:CAA80247 GB:Z22520 membrane protein [Bacillus acidopullulyticus] 
Identities = 47/193 (24%) , Positives = 86/193 (44%) , Gaps = 28/193 (14%) 

Query: 8 RKSADISRIAIFFAIMLVIHFVSSLVFNIWPIPI KPTLVHIPVIIASVLYGPRIGAI 64 

50 +KS + I I + V + P+P T++HIP II ++ GP +G I 

Sbjct: 2 KKSLTVRDIVIAGVLGAVAILLGVTRLGYIPVPTAAGNATIMHTPAI IGGIMQGPWGLI 61 

Query: 65 LGGLMGI IS VTTNTI ILLPTNYLFSPFVDHGTFASLIIAIIPRILIGITPYYCYKLIPNQ 124 
+G 4 GI S + T+ L F +4+I+PR+ IG+ 4 Y I 4 

55 Sbjct: 62 VGAIFGISSFLNATVPL FKDPLVSILPRLFIGWAWLVYIGIRRK 106 

Query: 125 FGLIVSGI IGSLTNTIFVLS-GIFIFFATVFDGNIKALLTAIISSNAIVEMIISAII 180 

4 G4 IG4LTNT VL4 +F 4 T 4 + +N 4 E ++ 1+ 

Sbjct: 107 SEYVAVGLSAFIGTLTNTALVLAMAVFRHYLTA GVAWTVAITNGLPEAWGTIV 160 

60 

Query: 181 TFVLIPTLSRLKR 193 
Sbjct: 161 TLAWLAWKQIGR 173 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 121/184 (65%), Positives = 157/184 (84%) 

Query: 6 KSSNIAIIAIFFAIMLVIHFLSSFIFSFWLVPIKPTLMHIPVIIASIAYGPRIGATLGAL 65 
5 KS++I+ IAIFFAIMLVIHF+SS +F+ W +PIKPTL+HIPVI IAS+ YGPRIGA LG L 

Sbjct: 9 KSADISRIAIFFAIMLVIHFVSSLVFNIWPIPIKPTLVHIPVIIASVLYGPRIGAILGGL 68 

Query: 66 MGGI SVANSSI VLLPTSYLFSPFVENGNFYSLI IALVPRILIGI I PYFVYKLLHNRFGLA 125 
MG ISV +4-I+LLPT+YLFSPFV++G F SLIIA++PRILIGI PY+ YKL+ N+FGL 
10 Sbjct: 59 MGIISVITNTIILLPTNYLFSPFVDHGTFASLIIAIIPRILIGITPYYCYKLIPNQFGLI 128 

Query: 126 ISGAIGSLTOTVFVlSGIFIFFSSTYNGNIKimAGIISSNSLAEMVIAAIIvYLTVPRI 185 

+SG IGSLTNT+FVLSGI FI FF++ ++GNIK +L IISSN++ EM+I+AII ++ +P + 
Sbjct: 129 VSGIIGSLTNTIFVLSGIFIFFATVFDGNIKALLTAIISSNAIVEMIISAIITFVLIPTL 188 

15 

Query: 186 LNIK 189 
+K 

Sbjct: 189 SRLK 192 

20 A related GBS gene <SEQ ID 8949> and protein <SEQ ID 8950> were also identified. Analysis of tl 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
McG: Di scrim Score: 13.42 
GvH: Signal Score (-7.5): -1.93 
25 Possible site: 53 

>» Seems to have a cleavable N-term signal seg. 
ALOM program count: 2 value: -6.21 threshold: 0.0 

INTEGRAL Likelihood = -6.21 Transmembrane 94 - 110 ( 93 - 115) 
INTEGRAL Likelihood = -0.16 Transmembrane 62 - 78 ( 62 - 79) 
30 PERIPHERAL Likelihood = 1.70 123 

modified ALOM score: 1.74 

*** Reasoning Step: 3 

35 Final Results 

bacterial membrane --- Certainty=0. 3484 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

ORF0156K301 - 723 of 1017) 

EGAD | 38021 | 39600 (1 - 129 of 183) hypothetical membrane protein {Bacillus acidopullulyticus} 
GP|806536|emb|CAA80247.l| |Z22520 membrane protein {Bacillus acidopullulyticus} 
%Match =7.6 
45 %Identity =29.7 %Similarity =53.9 

Matches = 38 Mismatches = 57 Conservative Sub.s = 31 

162 192 222 252 282 312 342 372 

KKIGYQEIEPRISLIiACGDTGQGALADISTILKCIQEVAN*AVNLYTISSLI*GVIMKKKNKSSNIAIIAIFFAIMLVIH 
50 HI :| | :: |: ::: 

MKKSLTVRDIVIAGVLGAVAILLG 



FLSSFIFSFWLVPIKPTLMHIPVIIASIAYGPRIGATLGALMGGISVANSSIVLLPTSYLFSPFVENGNFYSLIIALVPR 
Mill II I II =1 :|h I I h = : I I = = = ='=" I 1 

VTRLGYI PVPTAAGNATIMHI PAI IGGIMQGPWGLI VGAI FGI S S FLNATVPL FKDPLVSILPR 



ILIGIIPYFVY KLLHI^FGLAISGAIGSLTOT^'F^/XSGIFIFFSSTYNGNIKLKLAGI^SXNSLAEMVIAAIIVYLT 

::||:: ::|| = = =1 Ihllll =1 = 

LFIGvVAV&VYIGIRRKSEYVRVGLSAFIGTLT^ 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1995 

A DNA sequence (GBSx2105) was identified in S.agalactiae <SEQ ID 6161> which encodes the amino 
acid sequence <SEQ ID 6162>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence (or aa 1-18) 



1(] Final Results 

bacterial cytoplasm Certainty=0 . 0165 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44502 GB:U4888S DNA/pantothenate metabolism flavoprotein 
[Streptococcus mutans] 
Identities = 101/145 (69%) , Positives = 122/145 (83%) 

20 Query: 1 MIKRITLAVTGSISAYKAADLTSQLTKIGYDVHIIMTQAATHFITPLTLQVLSKNPIHLD 60 

M K+I LAV+GSI+AYKAADL+ QLTK+GY V++ MT AA +FI PLTLQVLSKNP++ + 
Sbjct: 1 MTKKILLAVSGSIAAYKAADLSHQLTKLGYHVNVFMTNAAKQFIPPLTLQVLSKNPVYSN 60 

Query: 61 VMDEHNPKIINHIEIAI^TDLFIVAPASANTIAHIAYGFADNIvTSVALAMPDETPKLIA 120 
25 VM E +P++INHI LAK+ DLF++ PASANT+AHLA+GFADNI VTSVALA+ P E PK A 

Sbjct: 61 WKEDDPQVINHIAIJ^OADLFLLPPASANTI^I^GFADNIWSVALALPLEVPKFFA 120 

Query: 121 PAMNTKMYHNTITQRNID1LKK1GY 145 
PAMNTKMY N ITQ NI +LKK GY 
30 Sbjct: 121 PAMNTKMYENPITQSNITLLKKFGY 145 

A related DNA sequence was identified in S. pyogenes <SEQ TD 61 63> which encodes the amino acid 
sequence <SEQ ID 6164>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0076 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 146/178 (82%) , Positives = 155/178 (87%) 

Query: 1 MIKRITLAVTGSISAYKAADLTSQLTKIGYDVKIIMTQAATEFITPLTLQVLSKNPIHLD 60 

M K ITLAV+GSISAYKAADLTSQLTKIGYDVHIIMTQAAT+FITPLTLQVLSKN IHLD 
Sbjct: 1 MTKHITLAVSGSISAYKAADLTSQLTKIGYDVHIIMTQAATQFITPLTLQVLSKNAIHLD 60 

Query: 61 VMDEHNPKIINHIEIAKRTDLFIVAPASANTIAHLAYGFADNIWSVALAMPDETPKIjIA 120 

VTOEH+PK+INHIELAKRTDLFIVAPASANTIAHLAYGFADN+OTSVALA+P TPKLIA 
Sbjct: 61 VMDEHDPKVINHIEIJAKRTDLFIVAPASANTIAHLAYGFAD^ILVTSVAIJALPATTPKIlIA 120 

Query: 121 PAMNIKMYHNTITQPJSIIDILKKIGYQEIEPRISIjIACGDTGQGALADISTILKCIQEV 178 

PAMNTKMY N ITQ NI L IG+ EI P+ SLLACGD G GALADI Hit 
Sbjct: 121 PAMNTKMYQNPITQENIKRLS'TIGFTEIPPKSSIjLACGDKGPGAIjADIDVIIiATIDTI 178 



SEQ ID 6162 (GBS236) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 5; MW 21.6kDa). 
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Purified GBS236-GST is shown in Figure 208 (lane 6) and in Figure 225 (lanes 4-5). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1996 

A DNA sequence (GBSx2106) was identified in S.agalactiae <SEQ ID 6165> which encodes the amino 
acid sequence <SEQ ID 6166>. This protein is predicted to be pantothenate metabolism flavoprotein 
homolog (dfp). Analysis of this protein sequence reveals the following: 

Possible site: 13 

i» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2325 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9835> which encodes amino acid sequence <SEQ ID 9836> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG39941 GB:AF301375 MTW1216 [Methanothermobacter wolfeii 
prophage psiMlOO] 

Identities = 71/229 (31%), Positives = 117/229 (51%), Gaps = 27/229 (11%) 

Query. 6 MKILITSGGTTEKIDTVRSITNHATGTLGKIlAEKXl^GHQOTLVTTKmVKPESATNL 55 

+++L++ GGT E IDVE ITU ++G +G +A + +G VTLV V + + L 

Sbjct: 172 LRVlVSLGGTLEPIDPWVITl<nRSSGRMGIAVAREAYIQGADVTLVA--GTVSVDIPSQL 229 

Query: 66 STFEIEDVDSLIKTLKPLVKEHDILIHSMAVSDYTPVYMADFEKVKSSDHLDTFLRKDNH 125 

T E + + + L+ EHD+ + + AVSD+ PVY 
Sbjct: 23 0 RTVRAETAHEMAEAVAELIGEHDVFVSAAAVSDFRPVYS 268 

Query: 126 F^KISSESEYQvLFLKKTPKVISLWKWNPQITLVGFIOjLvNvTKENLFKVARHSLIKNK 185 

E KISS+SE L LK PK+I + ++ NP+ +VGFK V++E L AR + + 
Sbjct: 269 EEKISSDSEI-TLRLKPNPKIIRMARETNPFAFIVGFKAEHGVSEEELIAAARKQIEDSV 327 

Query: 186 ATFILA1TOL-IDITSKHHIAYLLDHDNVYKATT--KEDIAQLIYEKVKK 231 

A ++AND+ ++ + ++ + V + T KE++A LI ++ K 

Sbjct: 328 7ADMWANDVSVEGFGSENIIRAIIVSEGVTELP-MKKEEIAGLIIGEIMK 376 

A related DNA sequence was identified in S. pyogenes <SEQ ID 6167> which encodes the amino acid 
sequence <SEQ ID 6168>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1737 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . C000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 142/230 (61%), Positives = 170/230 (73%) 

Query: 4 J^KILITSGGTTEKIDTvRSiraHATGTI^KIIAEKYLREGHQvTLVTTKimVKPESAT 63 

M MK++ITSGGTTE ID VR ITNH+TG LGK+I E++L+ H VTLVTTK A KP 
Sbjct: 1 MTMKLIITSGGTTEPIDAVRGITNHSTGQLGKLITERFLQYHHDVTLVTIKTATKPLPNK 60 
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Query: 64 NLSTFEIEDVDSLIKTLKPLVKEHDILIHSI^VSDYTPWMADFEKVKSSDHLDTFLRKD 123 

L E4E V+ L+ LK V HDILIHSMAVSDYTPVYM D E+V 4D4L4 FL 4 
Sbjct: 61 RLRIIEVETV1TOLMAALKDQVPHHDILIHSMAVSDYTPVYMTDLEQVSQADNLNCFLCEH 120 

Query: 124 NHEGKISSESEYQVLFLICKTPICVISLVKKIMPQITLVGFKLLYNVTKENLFKVARHSLIK 183 

N E KISS S4YQVLFLKKTPKVIS VK+WNP I LVGFKLLVNV +E L KVAR SL K 
Sbjct: 121 NSEPKISSASDYQVLFLI<KTPKVISY\W^PNII<IiVGFI<LLVirVPQEELIKVARASLAK 180 

Query: 184 NI<ATFIIiMTOLIDITSKIlHIAYLLDHDlWYKATTKEDIAQLIYEKVKKYD 233 

N A +ILANDL+DI + H A L+ ++ V A TKE IA L+YE++ K+D 
Sbjct: 181 NHADYILA1TOLVDIQTGMHI(ALLISNNEVASADTKEAIADLLYERMTI<HD 230 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1997 

A DNA sequence (GBSx2107) was identified in S.agalactiae <SEQ ID 6169> which encodes the amino 
acid sequence <SEQ ID 6170>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.22 Transmembrane 117 - 133 ( 117 - 133) 

Final Results 

bacterial membrane Certainty=0. 1086 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9833> which encodes amino acid sequence <SEQ ID 9834> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07541 GB:AP001520 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 94/221 (42%) , Positives = 133/221 (59%) , Gaps = 2/221 (0%) 

Query: 52 AEKPFIWTEVFLREINRSNQEIILHIWPMTKTVILGMLDRELPHLELAKKEIISRGYEPV 111 

A+F + + I+S LW TV+LG+ D LP ++ 4 + ++ + 

Sbjct: 27 ALQSFAYDDTLCTSIGKSQSPPTLRAWVHHNTVVLGIQDSRLPQIKAGIEALKGFQHDVI 86 

Query: 112 VRNFGGLAWADEGIIjNFSLVIPDVFERKLSISBGYLIMVDFIRSIFSDFYQPIEHFEVE 171 

VRN GGLAW D GILN SLV+ + E+ SI DGY +M + I S+F D + IE E+ 
Sbjct: 87 VRNSGGIAWLDSGILNLSLVLKE--EKGFSIDEGYELMYELICSMFQDHREQIEAREIV 144 

Query: 172 TSYCPGKFDLSINGKKFAGIAQRRIKNGIAVSIYLSVCGDQKGRSQMISDFYKIGLGDTG 231 

SYCPG +DLSI+GKKFAG+4QRRI+ G4AV IYL V G R44MI FY 4 
Sbjct: 145 GSYCPGSYDLSIDGKKFAGISQRRIRGGVAVQIYLCVSGSGAERAKMIRTFYDKAVAGQP 204 

Query: 232 SPIAYPNVDPEIMANLSDLLDCPMTVEEVIDRMLISLKQVG 272 

4 YP 4 PE MA4LS4LL P V DV4 4 L44L4Q G 
Sbjct: 205 TKFVYPRIKPETMASLSELLGQPHNVSDVLLKALMTLQQHG 245 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6171> which encodes the amino acid 
sequence <SEQ ID 6172>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

N-term signal seq 

' HI ( 95 - 111) 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB07541 GB:AP001520 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 97/228 (42%) , Positives = 138/228 (59%) , Gaps = 2/228 (0%) 

Query: 30 ALSPEWTEVFLKTINQEPNQLILHIWPMTRTVILGMLDRQLPYFELMCTEIGNNGYVPV 89 

ALF + + +I+ + LW TV+LG+ D +LP + + + + 

Sbjct: 27 ALQSFAYDDTLCTSIGKBQSPPTLRAWVHHNTWLGIQDSRLPQIKAGIEALKGFQHDVI 86 

Query: 90 TRNIGGLAWADDGILNFSLVIPDHFSESISISNAYLIMVDVIRESFSDYYQRIEYHEIK 149 

RN GGLAW D GILN SLV+ + + SI + Y +M ++I F D+ ++IE EI 
Sbjct: 87 VRNSGGLAWLDSGILNLSLVLKEE--KGFSIDDGYELMYELICSMFQDHREQIEAREIV 144 



Query: 210 TKVNYPQIDPECMATLSELLETPFTVAEVLERLRLTLRQLGFSLTEKS 257 

TK YP+I PE MA+LSELL P V++VL + +TL+Q G SL +S 
Sbjct: 205 TKFVYPRIKPETMASLSELLGQPHNVSDVLLICALMTLQQHGASLLTES 252 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 155/275 (56%), Positives = 199/275 (72%), Gaps = 8/275 (2%) 

Query: 32 QDIAQIiPVSIFKDYVTI^QDAEKPFIVnEWIjREINRSNQEIILHIWPMTKTVILGMLDR 91 

+DLA LP+ ++ D A PF+WTEVFL+ IN+ ++ILHIWPMT+TVILGMLDR 

Sbjct: 10 RDLASLPIFvYGDGNKKVPGALSPFVWTEVFLKTINQEPNQLILHIWPMTRTVILGMLDR 69 



Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct 

Sbjct 



92 ELPHLEIiAKKEI I SRGYEPWRNFGGJ'iAVVAD£,GI LNFSLVI PDVFERKLS ISDGYLIMV 151 

+LP+ ELAK EI + GY PV RN GGLAWAD+GILNFSLVIPD F +SIS+ YLIMV 
70 QLPYFELAKTEIGNNGYVPVTRNIGGLAWADDGILNFSLVIPDHFSESISISNAYLIMV 129 

152 DFIRSIFSDFYQPIEHFEVETSYCPGKFDLSINGKKFAGLAQRRIKNGIAVSIYLSVCGD 211 

D IR FSD+YQ IE+ E++ SYCPG FDLSI G+KFAG+AQRRIK GI VSIYLSVCGD 
130 DVIRESFSDYYQRIEYHEIKNSYCPGNFDLSIAGRKFAGIAQRRIKKGIWSIYLSVCGD 189 

212 QKGRSQMISDFYKIGLGDTGSPIAYPNVDPEIMANLSDLLDCPMTVEDVIDRMLISLKQV 271 

Q R Q+I DFY+ G + + YP +DPE MA LS+LL+ P TV +V++R+ ++L+Q+ 

190 QAARGQLIKDFYE^GTCGEVTKVNYPQIDPECmTLSELLETPFTVAEVLERLRLTLRQL 249 

272 GFN DRLLMIRPDLVAEFNRFQAKSMftNKG 300 

GF+ D+ L+ D V + R Q + + +G 

250 GFSLTEKSPDQALLTNFDAV--YERMQLEWRKEG 282 



A related GBS gene <SEQ ID 895 1> and protein <SEQ ID 8952> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -16.85 
GvH: Signal Score (-7.5): -5.07 

Possible site: 49 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -0.22 threshold: 0.0 

INTEGRAL Likelihood = -0.22 Transmembrane 117 - 133 ( 117 - 133) 
PERIPHERAL Likelihood = 0.47 73 
modified ALOM score: 0.54 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 1086 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF01564(451 - 1116 of 1518) 
5 EGAD | 13388 |BS3758 (27 - 249 of 281) hypothetical 31.4 kd protein in pta 3' region {Bacillus 

subtilis} OMNI|NT01B£4391 hypothetical protein SP| P39648 | YWFL_BACSD HYPOTHETICAL 31.4 KDA 
PROTEIN IN PTA 3'REGION. GP | 414014 1 emb | CAA51646 . 1 1 |X73124 ipa-90d {Bacillus subtilis} 
GP|2636300|emb|CAB15791.l| | Z99123 alternate gene name: ipa-90d {Bacillus subtilis} 
PIR|S39745|S39745 ywfL protein - Bacillus subtilis 
10 %Match =15.8 

%Identity =40.8 %Similarity =61.0 

Matches = 91 Mismatches = 82 Conservative Sub.s = 45 

321 351 381 411 441 471 501 531 

15 *WSILRETYWKISSDCDKINIAEFSRERMSDLLEWQDIA^^ 

II- I 

MANQPIDLLMQPKWRVIDQSSLGPLFDAKQSFAMDDTLCMSVGKGVSPATARS 

10 20 30 40 50 

20 561 591 621 651 681 711 738 768 

WPMTKTVII^MLDRELPHLEIAKKEIISRGYEPVVRMFGGIAWADEGIIMFSLVIPDVFERK-LSISDSYLIMVDFIRS 
I l==ll= I || h =111 =111 IMII1 1 = 1 = 11 11= I 1 = 1 = I II ll = = = l 

WVHHDTIVLGIQDTRLPFLQDGISLLESEGYRVIVRNSGGLAWLDDGVLNISLIFED- -EKKGIDIDKGYEAMVELMRR 
70 80 90 100 110 120 130 

25 

798 828 858 888 918 972 996 

IFSDFYQPIEHFEVETSYCPGKFDLSINGKKFAGLAQRRIKNGIAVSIYLSVCGDQKG--RSQMISDFYKIGLGD--TGS 
= : = II =1 = 1 Mill = I I I I I I I I I I I = = I I I = = 1 = 11 III I 1= I 1= =1 11= I I 
MLRPYNAKIEAYEIEGSYCPGSYDLSINGKKFAGISQRRVRGGVAVQIYL--CADKSGSERADLIRRFYQAALKDKQNDK 
30 150 160 170 180 190 200 

1026 1056 1086 1116 1146 1176 1206 1236 

PIAYPNVDPEIMANLSDLLDCPMTVEDVIDRKLISLKQVGFNDRLLMIRPDLVAEFNRFQAKSMANKGMVSRI)E*CPR*F 
II = II 11=11=11 ==l=l== =1 II : 
35 KGVYPEIRPETIVmSLSELLQKDISVQDLMFALLTELKALSTHLYSAGLSIDEEMEFEKNLvRMAERNAKVFG 
220 230 240 250 260 270 280 

SEQ ID 8952 (GBS390) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 7; MW 37kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 82 (lane 3; MW 62kDa). 

40 GBS390-GST was purified as shown in Figure 216, lane 12. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1998 

A DNA sequence (GBSx2108) was identified in S.agalactiae <SEQ ID 6173> which encodes the amino 
45 acid sequence <SEQ ID 6174>. This protein is predicted to be probable trimethylamine dehydrogenase 
(nemA). Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 .2218 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA83700 GB:Z33015 similar to trimethylamine DH [Mycoplasma 
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capricolum] 

Identities = 162/311 (52%), Positives = 219/311 (70%), Gaps = 1/311 (0%) 

Query: 3 OTQGHLFRPLTLPNGLSLENRFVLSP^-TNSSTSEGEnn'DDDIAYAVRRAKSAPLQITGA 62 

N LF P L NG LENRFVLSPM + +T +G +TD + Y RR+ SAPLQITG 
Sbjct: 2 NKYEKLFEPFYL-NGFKLENRFVLSPMTLSIATLDGKITDKEADYVKRRSHSAPLQITGG 60 

Query: 63 AYITEYGQLFEYGFSVSKDEDIPGLTKLAKAMKSKGAKAVLQLTHAGRFSSHTLARHGYV 122 

Y E+GQLFEYG S D+DIP LT+L + MK+ +LQL HAG+FS +L ++GY+ 

Sbjct: 61 WFDEFGQLFEYGISAKSDDDIPSLTRLYQEMKTDSNCVILQLAHAGKFSKTSLBaCYGYL 120 

Query: 123 YGPSPMQLQSPYPHQVKELTHKDILRIIDEYVQATRRAIQAGFDGVEISSAQRLLIQTFF 182 

YGPS + +P H+V EL + I +11 +Y AT R I+AGF+G+EIS AQRLLIQTFF 
Sbjct: 121 YGPSYEKNHTPIEHEVLELPKEKIKQIIQDYKDATLRVIKAGFNGIEISMAQRLLIQTFF 180 

Query: 183 STFSNQRKDEYGPQTLTNRCRLGLEVFKAVQKVIREEAESDFILGFRATPEETRGSQIGY 242 

S N+R DEY NR R LEV KA+++VI + A +FI GFRATPEET G +GY 

Sbjct: 181 SQIINKRTDEYSATNFENRSRFCLEWKAIREVIDKYAPKNFIFGFRATPEETYGDILGY 240 

Query: 243 SIEEFMEFLEKILAIAQVDYLAIASWGHDVFRNTIRSEGVYKGQLVNQVIFEHFGDRVPI 302 

+ IE+F++ ++KI + I ++ YLAI ASWGHD+ + N +RS YKGQL VN+ VI + + + +++PI 
Sbjct: 241 TIEDFIQLVBKIIEIGKISYIAIASWGHDIYLNKVRSNTKYKGQLVNKVIYDIYKNKLPI 300 

Query: 3 03 MATGGINSASK 313 

+++GGIN+ +K 
Sbjct: 301 ISSGGINTPTK 311 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6175> which encodes the amino acid 
sequence <SEQ ID 6176>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3055 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certair.ty^O . C000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 265/390 (67%), Positives = 321/390 (81%) 

Query: 8 LFRPLTLPNGLSLENRFVLSPMVTNSSTSEGFVTDDDIAYAVRRAKSAPLQITGAAYITE 67 

LF PLTLPNG L+NRFVLSPMVTNSST +G+VT DD++YA+RRA SAPLQITGAAY+ 
Sbjct: 8 LFEPLTLPNGSQLDNRFVLSPMVTNSSTKDGYVTQDDVSYALRRAASAPLQITGAAYVDP 67 

Query: 68 YGQLFEYGFSVSKnEDIPGLTKIAKAMKSKGAKAVLQLTHAGRFSSHTLARHGYVYGPSP 127 

YGQLFEYGFSV+KD DI GL +LA+AMK+KGAKAVLQLTHAGRF+SH L ++G+VYGPS 
Sbjct: 68 YGQLFEYGFSVTKDADISGLKELAQAMKAKGAKAVLQL"HAGRFASHALTKYGFVYGPSY 127 

Query: 128 MQLQSPYPHQVKELTHKDILRIIDEYVQATRRAIQAGFDGVEISSAQRLLIQTFFSTFSN 187 

MQL+SP PH+VK LT + I +1 Y QATRRAIQAGFDGVE+SSAQRLLIQTFFSTFSN 
Sbjct: 128 MQLRS PQPHEVKPLTGQQI EEL I AAYAQATRRAI QAGFDGVEVSSAQRLL I QTFFSTFSN 187 

Query: 188 QRKDEYGPQTLTNRCRLGLEVFKAVQKVIREEAESDFILGFRATPEETRGSQIGYSIEEF 247 , 

+R D YG QTL NR +L L V +AVQ+VI++EA FI GFRATPEETRG+ IGYSI+EF 
Sbjct: 188 KRTDSYGCQTLFNRSKLTLAVLQAVQQVIKQEAPDGFIFGFRATPEETRGNDIGYSIDEF 247 

Query: 248 MEFLEKILAIAQVDYLAIASWGHDVFRNTIRSEGVYKGQLVNQVIFEHFGDRVPIMATGG 307 

++ ++ +L +A++DYIAIASWG VFRNT+RS G Y G+ VNQV+ ++ +++P+MATGG 
Sbjct: 248 LQLMDWVIJWAKLDYIAIASWGRHVFRNTTOSPGPYYGRRvNQVVRDYLRNKLPVMATGG 307 

Query: 308 INSASKVFEALQHAHMIGASTPLVVDPEFLQKIKAKCSDQINLRIKVSDLEGIAIPKASF 367 

+N+ K EAL HA IG STP WDPEF KIK C + I+LRI+ +DL+ LAIP+ASF 
Sbjct: 308 MOT'PDKAIEALAHADFIGVSTPFVVDPEFAHKIKEGCEESIHLRIRPADLKSLAIPQASF 367 

Query: 368 KDIVPLMDYGESLPKEAREVFRELRSNYRE 397 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1999 

A DNA sequence (GBSx2109) was identified in S.agalactiae <SEQ ID 6177> which encodes the amino 
acid sequence <SEQ ID 6178>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3748 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) <: suco 

The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MKLSVLDYGLIDYGKTASDAIQETILLSQEAERLGYHQFWVAEHHGVKAFSISNPELMIM 60 

MKLSVLD I YG A +A+++T L++ E LGYH+FWV+EHH + S+PE++I 

Sbjct: 1 MKLSVIDQSPIAYGSNAKEALRQTTEIAKVTEALGYHRFWVSEHHDASTLAGSSPEVLIA 60 

Query: 61 HLANQTKSIKIGSGGIMPJjHYSSFKLAETLKTLETCHPNRVSIGLGNSLGTVKVSNALRS 120 

HLA TK I++GSGG+M HYS++K+AE K LE HP R+ +GLG + G + ++ 
Sbjct: 61 HLAAHTKKIRLGSGGVMLPHYSAYKVAENFKLLEALHPGRIDVGLGRAPGGMPIAKMALQ 120 

Query: 121 LHK AHDYEEVLEELKSWLIDESSSKEPL VQPTLSSFPDLYVLGSGQKSAYLRA 173 

K H Y ++++ +L D+ + P + + PD+++LGS SA +AA 

Sbjct: 121 EGKEQNIHKYPLQVKDVIGYLQDDLPTDHRFHGLKATPLIDTVPDVWLLGSSGGSANVAA 180 

Query: 174 KLGLGFTFGVFPFMDKDPLTEAKKLSSLYYHQFEEYYPNKSPWLMVA^FWIADTSEEAE 233 

+ G GF F F4+ + +A + Y F+ P VA FV+ ADT E+A+ 

Sbjct: 181 ENGTGFAFA- -HFINGEGGVQAVE SYRETFQPSALFDRPQTSVAIFVICADTDEQAD 235 

Query: 234 NIAKTLDIWMLGNKDFNEFATFPTIEEANHYQLTPSQKAKIKSNRDRM1VGDPKQVKESL 293 

IA +LD+ ++ ++ P+IE A Y +P ++A+I+ MR RMIVG PK V++ L 

Sbjct: 236 QIASSLDLSLIMLENGQLSKGTPSIESALSYPYSPFERARIRENRKRMIVGSPKAVRQQL 295 

Query: 294 DALVNASQAEELLLIPLVPGLDQRIKSLKLLSQ 326 

L A + EE++++ + + RI+S +IiL + 
Sbjct: 296 VELARAYETEEVIWTITHRFEDRIRSYELLGE 328 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6179> which encodes the amino acid 
sequence <SEQ ID 6180>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.60 Transmembrane 212 - 228 ( 210 - 229) 

Final Results 

bacterial membrane Certainty=0. 2041 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 173/329 (52%) , Positives = 241/329 (72%) , Gaps = 1/329 (0%) 
Query: 1 MKLSVLDYGLIDYGKTASDAIQETILLSQEAERIiGYHQFWVAEHHGVKAFSISNPELMIM 60' 
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Query: 51 HIJfflQTKSIKIGSGGIMPLHYSSFKLAETLKT'LETCHPNRVSIGLGNSLGTVKVSNALRS 120 
5 HLA+ TK I + IGSGGIMPLHYSSFK+AE + TLE HPNR+ 4G+GNSLGT V AL S 

Sbjct: 61 HLADHTKQIRIGSGG1MPLHYSSFKIAEWIMTLEALHPNRIDLGIGNSLGTTLVQRALSS 120 

Query: 121 LHKflHDYEEVLEELKSWLIDESSSKEPL-VQPTLSSFPDLYVLGSGQKSAYLAAKLGLGF 179 
+H Y +V+ EL +L + S P+ V P +++P ++ L + ++A LA +LGLG+ 
10 Sbjct: 121 IHCKDSYSQWTELYQYLNPDHLSPLPIFVNPRGNTYPQIWTLSNSLETAELAGQLGLGY 180 

Query: 180 TFGVFPFMDKDPLTFAKKLSSLYYHQFEEYYPNKSPMLMVAAFWIADTSEEAENIAKTL 239 

TFG+FP++ KDP+TEAK++S+ Y F K P L++A F+V++DT E+AE +AK L 

Sbjct: 181 TFGIFPYIPKDPITEAKRVSAHYRKAFRPSKLLKIPKLILAVFIVLSDTDEKAEALAKPL 240 

15 

Query: 240 DIWMLGNKDFNEFATFPTIEEANHYQLTPEQKAXIKSNRDRMIVGDPKQVKESLDALVNA 299 

DIWMLG +DFNEF T+P +EEA +Y LT +Q+ I +NR RM++G P VK+ LD L+ A 
Sbjct: 241 DIWMLGQQDFNEFKTYPDVEEARNYHLTEKQREAIAANRSRMVIGSPHTVKKQLDRLIEA 300 

20 Query: 300 SQAEELLLIPLVPGLDQRIKStiKLLSQLY 328 

QA+ELL IPLVP R ++L+LL4 LY 
Sbjct: 301 CQADELLAI PLVPEFANRQRTLELLADLY 329 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 2000 

A DNA sequence (GBSx2110) was identified in S.agalactiae <SEQ ID 6181> which encodes the amino 

acid sequence <SEQ ID 6182>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
30 »> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2384 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 {Not Clear) < suco 

35 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF81345 GB:AC007767 Identical to a glycine cleavage system 

H-protein precursor from Arabidopsis thaliana gb|P25855. 
40 It contains a glycine cleavage H-protein domain 

PF| 01597. ESTs gb|R90208, gb|AI 
Identities = 30/91 (32%), Positives = 53/91 (57%), Gaps = l/91 (1%) 

Query: 18 TISLTPELQDDLGTVGYVEFTD-DANLETODVIMIEASKTVMAILSPLTGKV\''KVNTAA 76 
45 TI +T QD LG V +VE + ++++ + +E+ K ILSP++G+V++VNT 

Sbjct: 59 TIGITDHAQDHLGEWFVELPEANSSVSKEKSFGAVESVKATSEILSPISGEVIEVNTKL 118 

Query: 77 SQEPTLLNSEKADENWLWLTEVDYAAFEAL 107 
++ P L+NS ++ W++ + A EAL 

50 Sbjct: 119 TESPGLINSSPYEDGWMIKVKPSSPAELEAL 149 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6183> which encodes the amino acid 
sequence <SEQ ID 6184>. Analysis of this protein sequence reveals the following: 

■I-terminal signal sequence 



- Final Results 

bacterial cytoplasm Certainty=0. 3544 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/110 (72%) , Positives = 98/110 (88%) 

Query: 1 MKKIMTYLLIEKIffiELYTISLTPELQDDLGWGYWFTDDMILEVDDVIIiNIEASKTVMA 60 

MKKIANYLLIEK ++ YTIS+TPELQDD+GT+GY EFTD+ +L VDD+ILN4EASICTVM+ 
Sbjct: 1 MKKIAOTLLIEKTDDRYTISMTPELQDDIGTIGYAEFTDOTiHlAVDDIII^LEASKTVMS 60 

Query: 61 ILSPLTGKOTKVOTAASQEPTLLNSEKADENWLVVLT^ 110 

+LSPL G W+ N AA+ PTLLNSEKA+ENW+WLT+VD AAF+ALE+A 
Sbjct: 61 VLSPlAGAVAffiimAATLTPTLIiNSEKMElWI\AnJTDVI)QAAFDM J EDA 110 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



15 Example 2001 

A DNA sequence (GBSx2111) was identified in S.agalactiae <SEQ ID 6185> which encodes the amino 
acid sequence <SEQ ID 6186>. This protein is predicted to be LRP16 (M045). Analysis of this protein 
sequence reveals the following: 

Possible site: 17 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0608 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF15294 GB:AF202922 LRP16 [Homo sapiens] 
Identities = 73/171 (42%) , Positives = 98/171 (56%) , Gaps - 13/171 (7%) 

30 

Query: 88 DICLLQVDAIVNAANSKLLGCFIPNHHCIDNQIHTFAGSRLRLACHQLMTQQGRMEAVGQ 147 

DI L+VDAIVNAANS LLG +D IH AG L C L + + G+ 
Sbjct: 78 DITKLEVDAIVNAftNSSLLG GGGVDGCIHRAAGPLLTDECRTLQSCK TGK 127 

35 Query: 148 AKLTESYHLPCKYVIHTVGPYVKVDQKPSRIREDLLKSSYKSCLQLAVRANLKTIVFPCI 207 

AK+T Y LP KYVIHTVGP + S+ E L+S Y S L L + L+++ FPCI 
Sbjct: 128 AKITGGYRLPAICYVIHTVGPIAYGEPSASQAAE--LRSCYLSSLDLLLEHRLRSVAFPCI 185 

Query: 208 STGEFGFPNQRAAELAVQAILEWQRENQHKL-YIIFNTFTPKDQDIYQKLL 257 
40 STG FG+P + AAE+ + + EW +++ K+ +1 F KD+DIY+ L 

Sbjct: 186 STGVFGYPCEAAAEIVLATLREWLEQHKDKVDRLIICVFLEKDEDIYRSRL 236 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6187> which encodes the amino acid 
sequence <SEQ ID 61 88>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certa±nty=0. 1992 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 139/266 (52%) , Positives = 178/266 (66%) , Gaps = 
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Sbjct: 1 MPSSFDLLGEMIGLLQTEQLTSSWACPLPKALTKRQDLWRALINQRPALPLSKDYLNLED 6 0 

Query: 57 RYLSHWWWTQKVKTIDVCHQTVYSNVFTYHGDICLLQVDAIVNAANSKLLGCFIPNHHCI 116 

YL W + ++ C +T Y+++F YHGDI L VDAIVHAANS+LLGCF PNH CI 
Sbjct: 51 AYLDDWPASFVPVSVKDCQKINYTSLFLYHGDIRYLAVDAIVNAANSELLGCFSPNHGCI 120 

Query: 117 DNQIHTFAGSRLRIACHQLMTQQGRMEAVGQAKLTESYHLPCKYVIHTVGPYVKVDQKPS 176 

DN IHTFAGSRLRLAC +MT+QGR EA+GQAKLT +YHLP Y+IHTVGP + S 
Sbjct: 121 DNAIHTFAGSRLRLACQAIMTEQGRKEAIGQAKLTSAYHLPASYIIHTVGPRITKGHHVS 180 

Query: 177 RIREDLLKSSYKSCLQLAVRANLKTIVFPCISTGEFGFPNQRAAELAVQAILEWQRENQH 236 

IR DLL Y+S L LAV+A L ++ F ISTGEFGFP + AA++A++ +L+WQ E+ 
Sbjct: 181 PIRADLLARCYRSSLDLAVKAGLTSLAFCSISTGEFGFPKKEAAQIAIKTVLKWQAEHPE 240 

Query: 237 K- -LYIIFNTFTPKDQDIYQKLLLKE 260 

L IFNTFT +D+ +Y L KE 
Sbjct: 241 SKTLTTIFNTFTSEDKALYDTYLQKE 266 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2002 

A DNA sequence (GBSx2112) was identified in S.agalactiae <SEQ ID 6189> which encodes the amino 
acid sequence <SEQ ID 6190>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results . 

bacterial cytoplasm Certainty=0. 2171 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6191> which encodes the amino acid 
sequence <SEQ ID 6192>. Analysis of this protein sequence reveals the following: 
Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2477 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 218/284 (76%) , Positives = 250/284 (87%) 

Query: 4 WKTLEKTNHSQSEILSQLIEESDAIWGIGAGMSAADGFTYIGPRFEEAFPDFIAKYQLL 63 

W T + N +Q+E L+QLI+E+DA+WGIGAGMSAADGFTYIG RFE AFPDFIAKYQ L 
Sbjct: 4 m'TYPQKNLTQAEQI^QLIKEADALVVGIGAGMSAADGFTYIGSRFETAFPDFIAKYQFL 63 

Query: 64 DMLQASLYDFEDWEEYWAFQSRWAUTYLDQPVGQAYLDLKDIIjAKKEYHIITTNADNAF 123 

DMLCASL+DFEDW+EYWAFQSRFVALNYLDQPVGQ+YLDLK+IL K+YHI ITTNADNAF 
Sbjct: 64 DMLQASLFDFEDWQEYWAFQSRFVALNYLDQPVGQSYLDLKEILGTKDYHIITTNADNAF 123 

Query: 124 AVADYNLEKVFHIC^EYGLWQCSQHCHQQTYRNDQAIRQMIAQQKDMKIPSNLIPKCPKC 183 

VA Y+ +FHIQGEYGLWQCSQHCHQQTY++D IRQMI A+QK+MK+ P LIP CP+C 
Sbjct: 124 WVAGYDPHNIFHIQGEYGLWQCSQHCHQQTYKDDTVIRQMIAEQKNMKVPGQLIPHCPEC 183 



Query: 184 DQPFEINKRNEEKG^WEDADFHAQRQRYENFLSQHQNDKVLYLEIGVGHTTPQFIKHPFW 243 
+ PFEINKRHEEKGMVEDADFHAQ+ RYE FLS+H+ KVLYLEIGVGHTTPQFIKHPFW 
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Sbjct: 184 EaPFEINH^EKG^DADFHAQKARYEa.FLSEHKEGKVLYLEIGVGHTTPQFIKHPFW 243 

Query: 244 RFVSLNENSLFVTLNHKHYF.IPQKIRSRSVQLTQHIAELIAEAK 287 

4 VS N N+LFOTLNHKHYRIP IR +S++LT+HIA+LI+ K 
Sbjct: 244 KRVSENPNALFVTLNHKHYRIPLSIRRQSLELTEHIAQLISATK 287 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2003 

A DNA sequence (GBSx2113) was identified in S.agalactiae <SEQ ID 6193> which encodes the amino 
acid sequence <SEQ ID 6194>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1086 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12865 GB:Z99109 similar to lipoate-prctein ligase [Bacillus subtilis] 
Identities = 130/331 (39%) , Positives = 206/331 (61%) , Gaps = 5/331 (l%) 

Query: 9 NGKRITDGAIAlAMQWILQWFLDDDILFPYyCDPKVEIGKFQNAVIETNQEYLKEHDI 68 

4 4 I D I LA++ Y 4444 + L Y P + IGK QN + E N +Y++E+ I 
Sbjct: 5 DNQNINDPRINLAIEEYCVKHLDPEQQYLLFYVNQPSIIIGKNQNTIEEINTKYVEENGI 64 

Query: 69 PVTORDTGGGAVYVDSGAWICYLMKDHGQ-FGDFKRAYEPAIKALKTLGASSVEMRERN 127 

WRR 4GGGAVY D G +N ++ KD G F +FK+ EP I+AL LG + E+ RN 
Sbjct: 65 IVVRRLSGGGAVYT1DLGNLNFSFITKDDGDSFHNFKKFTEPVIQALHQLGVEA-ELSGRN 123 

. Query: 128 DLVIDGKKVSGAAMTIWGRIYGGYSLLLDVDFDAMEKATjNPNRKKIESKGIKSVRSRVG 187 
D+V+DG+K+SG A GRI+ +L+ D D + L + KIESKGIKS+RSRV 

Sbjct: 124 D I WDGRKI SGNAQFATKGRI FSHGTLMFDSAIDHWSALKVKKDKIES KGI KS IRSRVA 183 

Query: 188 D I RSHLSEDYRH I TTDQFKDLMVCQLLH I DHI DQAKRYHLTEKDWAAIDALADEKYKNWD 247 

+1 L + +TT++F+ +++ + + + Y LTEKDW I 4+ E4Y4NWD 

Sbjct: 184 NISEFLDDK MTTEEFRSHLLRHIFNTNDVGNVPEYKLTE1CDWETIHQ1SICERYQNV3D 240 

Query: 248 WNYGNSPQYSYHRDARFPSGTYDFHLEIEKGIITNCRIYGDFFSSKDISDIENLLIGCPM 307 

WNYG SP+++ 4 R+P G+ D HLE44KG I 4C4I4GDFF D4S4IENLL+G 
Sbjct: 241 WNYGRSPKFNLNHSKRYPVGSIDLHLEVKKGKIEDCKIFGDFFGVGDVSE1ENLLVGKQY 300 



Sbjct: 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6195> which encodes the amino acid 
sequence <SEQ ID 6196>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0939 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 248/339 (73%) , Positives = 283/339 (83%) 
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Query: 1 WYLIEPIRNGKRITDGIAIALAMQVYILQNVFLDDDILFPYYCDPKVEIGKFQNAV'IETNQ SO 

MYLIEPIRNGKRITDGA+ALAMQVY+ +N+FLDDDI LFPYYCDPKVE IGKFQNftV+ETNQ 
Sbjct: 1 MYLIEPIRNGKRITDGAVMiAMQVWQENLFLDDDILFPYYCDPKOTIGKFQNAVYETNQ SO 

Query: 61 EYLKEHDIPWRRDTGGGAVYVDSGAVNICYLMKDHGQFGDFKRAYEPAIICALKTLGASS 120 

EYLKEH IPVVRRDTGGGAVYVDSGAVNICYL+ D+G FGDFKR Y+PAI+AL LGA+ 
Sbjct: 61 EYLKEHHIPVVRRDTGGGAVYVDSGAVNICYLINDNGIFGDFKRTYQPAIEALHHLGATE 120 

Query: 121 VEMRERTOLVIDGKKVSGAAMTIVNGRIYGGYSLLLDVDFDAMEKVLNPNRKKIESKGIK 180 

VEM RNDLVIDGKKVSGAAMTI NGR+YGGYSLLLDVDF-fAMEK L PNRKKIESKGI+ 
Sbjct: 121 VEMSGRNDLVIDGKKVSGAAMTIANGRVYGGYSLLLDVDFEAMEKALKPNRKKIESKGIR 180 

Query: 181 SVRSRVGDIRSHLSEDYRHITTDQFKDLMVCQLLHIDHIDQAKRYHLTEKDWAAIDAIAD 240 

SVRSRVG+IR HL+ Y+ IT ++FKDLMVCQLL 1+ I QAKRY LTEKDW IDAL 4- 
Sbjct: 181 SVRSRVGNIREHLAPQYQGITIEEFKDLMVCQLLQIETISQAKRYDLTEKDWQQIDALTE 240 

Query: 241 EKYKNWDWNYGNSPQYS YHRDARFPSGTYDPHLE I EKGI 1 TNCRI YGDFFSS KDI SD I EN 300 

KY NW+WNYGN+PQY YHRD RF SID HL+I+KG I CRIYGDFF DI+++E 
Sbjct: 241 RKYHN^WNYGNAPQYRYHRDGRFTGGTVDIHLDIKKGYIAACRIYGDFFGKADIAELEG 300 

Sbjct: 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2004 

A DNA sequence (GBSx2114) was identified in S.agalactiae <SEQ ID 6197> which encodes the amino 
acid sequence <SEQ ID 6198>. Analysis of this protein sequence reveals the following: 
Possible site: 17 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 196 - 212 ( 196 - 212) 



35 Final Results 

bacterial membrane Certainty=0. 1595 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAE49329 GB:U39612 f ormyl-tetrahydrofolate synthetase 
[Streptococcus mutans] 
Identities = 432/556 (77%) , Positives = 493/556 (87%) 

45 Query: 1 MKTDIEIAQSVALKPIAEIVEQVGIGFDDIELYGKYKAKLSFDKIEAVKSQKVGKLILVT 60 

MKTDIEIAQSV L+PI +V+++GI FDD+ELYGKYKAKL+FDKI+AV+ GKL+DVT 
Sbjct: 1 MKTDIEIAQSVDLRPITNWKKLGIDFDDLELYGKYKAKLTFDKIKAVEENAPGKLVLVT 60 

Query: 61 AINPTPAGEGKSTMSIGLADALNKIGKKTMIALREPSLGPVMGIKGGAAGGGYAQVLPME 120 
50 A1NPTPAGEGKST++IGLADAIJNKIGKKTMIA4.REPSLGPVMGIKGGAAGGGYAQVLPME 

Sbjct: 61 AINPTPAGEGKSTITIGLADALNKIGKKTMIAIREPSLGPVMGIKGGAAGGGYAQVLPME 120 

Query: 121 DINLHFTGDMHAITTAffl^ALSALLDITOIHQGITOLDIDQRRVIWKRWDIiNDRALRQVIVG 180 
DINLHFTGDMHAITTANNALSAL+DNH+HQGNEL 1DQRR+ IWKRWDLNDRALR V VG 
55 Sbjct: 121 DINLHFTGD^^ITTANNALSALIDNHLHCGNELGIDQRRIIWI<KVVDLNDRALRHVTVG 180 

Query: 181 LGSPVNGIPREDGFDIWASEI^ILCIATDLSDLKKRLSNIVVAYSRNRKPIYVKDLKI 240 

LGSP+NGIPREDGFDITVASEIMAILCLAT++ DLK+RL+NIV+ Y +R P+YV+DL++ 
Sbjct: 181 LGSPINGIPREDGFDITVASEIMAILCLATNVEDLKERLANIVIGYRFDRSPVYVRDDEV 240 
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Sbjct: 241 QGALALILKEAIKPNLVQTIYGTPAF^rHGaPFiOTIAHGCNSVLATSTALRLADYTITEAG 300 

Query: 301 FGMDLGAEKFLDIKTPNLPTSPDAIVIVRTLRaLKMHGGVSKEDLSQENVEAVKRGFTNL 360 

FGADLGAEKFLDIK PNLPTSPDA+VIVAT+RALKM+GGV+K+ L+QENVEAVK GF NL 
Sbjct: 301 FGADLG^KFLDIKAPNLPTSPim.WimTIRALKOTGGVAKDALNQElWBaVKAGFANL 360 

Query: 361 ERHVNNMRQYGVPVWAINQFTADTESEIATLKTLCSNIDVAVELASVWEDGADGGLELA 420 

RHV KMR+YGVPWVAIN+F DT EIA L+ LC+ IDV VELASVW +GADGG++LA 
Sbjct: 361 ARHWISMROGVPVWAINEFITDT1TOEIAVLRNLCAAIDVPWLASVWANGADGGVDIA 420 

T+ N IE S+YKRLY++ ++EEK+ +1 +IY +KV F KA+ Q+ + NGWD 
Sbjct: 421 NTLIOTIENNPSHYiffiLYDimLSVEEIOTEIAI<EIYT^KVIFEKKAKTQIAQrVKNGTO 480 

Query: 481 KMP I CMAKTQYS FSDNPNLLGAPTDFD ITVRE FVPXTGAGFI VALTGDVLTMPGLPKKPA 540 

+PICMAKTQYSFSD+P LLGAPT FDIT+RE VPK GAGFIVALTGDV+TMPGLPKKPA 
Sbjct: 481 NLPICMAKTQYSFSDDPKLLGAPTGFDITIRELVPKLGAGFIVALTGDVMTMPGLPKKPA 540 

Query: 541 ALNMDVLEDGTAIGLF 556 

ALNMDV DGTA+GLF 
Sbjct: 541 ALNMDVAADGTALGLF 556 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6199> which encodes the amino acid 
sequence <SEQ ID 6200>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 196 - 212 ( 196 - 212) 

Final Results 

bacterial membrane — Certainty=0. 1595 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

■ >GP:AAB49329 GB:U39612 f ormyl - tetrahydrof olate synthetase 
[Streptococcus mutans] 
Identities = 432/556 (77%) , Positives = 490/556 (87%) 

Query: 1 MKSDIEIAQSVALQPITDIVKKVGIDGDDIELYGKYKAKLSFEKMKAVEANEPGKLILVT 60 

MK+DIEIAQSV L+PIT++VKK+GID DD+ELYGKYKAKL+F+K+KAVE N PGKL+LVT 
Sbjct: 1 MKTDIEIAQSVDLRPITNWKKLGIDFDDLELYGKYKAKLTFDKIKAVEENAPGKLVLVT 60 

Query: 61 AINPTPAGEGKSTMS IGLADALNQMGKKTMLALREPSLGPVMGI KGGAAGGGYAQVLPME 120 

A1NPTPAGEGKST++IGI^ALN++GKKTM-A+REPSLGPVMGIKGGAAGGGYAQVLPME 
Sbjct: 61 AINPTPAGEGKSTITIGLADALNKIGKKTMIAIREPSLGPVMGIKGGAAGGGYAQVLPME 120 

Query: 121 DINLHFTGDMHAITTANNALSALIDNKIjQQGNELGIDPRRIIWKRvLDIjlTORALRQVIVG 180 

DINLHFTGDMHAITTANNALSALIDNHL QGN+LGID RRIIWKRV+DLNDRALR V VG 
Sbjct: 121 DINLHFTGDMHAITTANim,SALIDNHmQGNELGIDQRRIIWKRVVDI^RALRHVTVG 180 

Query: 181 LGSPVNGVPREDGFDIWASEII^ILCLATDLKDLKKRIADIWAYTYDRKPvYVRDLKV 240 

LGSP+NG+PREDGFDITVASEIMAILCLAT+++DLK+RLA+IV+ Y +DR PVYVRDL+V 
Sbjct: 181 LGSPINGIPREDGFDITVASEIMAILCLATNVEDLKERLANIVIGYRFDRSPVYVRDLEV 240 

Query: 241 EGALTLILKDAIKPNLVQTIYGTPALIHGGPFANIAHGCNSVT^TSTALRLADYTVTEAG 300 

+GAL LILK+AIKPNLVQTIYGTPA +HGGPFANIAHGCNSVLATSTALRLADYT+TEAG 
Sbjct: 241 QGAIALILKEAIKPlttVQTIYGTPAFVHGGPFANIAHGCNSvIjATSTALRLADYTITEAG 300 

Query: 301 FGADLGAEKFI^IKVPNLPKAPDAIVIvATLRALKMHGGVAKSDIAAENCEATOLGFANL 360 

FGADLGAEKFL+IK PNLP +PDA+VI VAT+RALKM+GGVAK L EN EAV+ GFANL 
Sbjct: 301 FGADLGAEKFLDIKAPMiPTSPDAWIVATIRALKMNGGV^ 360 

Query: 361 KEHVENMRQFKVPVWAINEFVADTEMIATLKALCEEIKVPVEIASVWANGAE 420 

RHVENMR++ VPVWAINEF+ DT EIA L+ LC I VPVELASVWANGA+GG+ LA. 
Sbjct: 361 ARHVENMRKYGVPVWAINEFITDTNDEIAVLRWLC^IDVPvEIASWANGADGGvDLA 420 
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Query: 421 KTVVRyiDQEAftDYKRLYSDEDTLEEKVINIVTQIYGGK&VQFGPKAKTQLKQFAEFGWD 480 

T++ 1+ + YKRLY + ++EEKV I +IY V F KAKTQ+ Q + GWD 
Sbjct: 421 OTLIOTIENNPSHYKRLYDNNLSVEEKVTEIAKEIYRM3KVIFEKKAKTQIAQIVKNGMD 480 

Query: 481 KLPVCMAKTQYSFSDNPSLLGAPTDFDITIREFVPKTGAGFIVGLTGDVMTMPGLPKVPA 540 

LP+CMAKTQYSFSD+P LLGAPT FDITIRE VPK GAGFIV LTGDVMTMPGLPK PA 
Sbjct: 481 NLPICMAKTQYSFSDDPKLLGAPTGFDITIRELVPKLGAGFIVALTGDVMTMPGLPKKPA 540 

Query: 541 AMAMDVAENGTALGLF 55S 

A+ MDVA +GTALGLF 
Sbjct: 541 ALNMDVAADGTALGLF 556 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 452/556 (81%) , Positives = 513/556 (91%) 

Query: 1 MKTDIEIAQSVALKPIAEIVEQVGIGFDDIELYGKYKAKLSFDKIEAVKSQKVGKLILVT 60 

MK+DIEIAQSVAL+PI 4-IV++VG1 DDI3LYGKYKAKLSF+K++AV++ + GKLILVT 
Sbjct: 1 MKSDIEIAQSVALQPITDIVKKVGIDGDDIBLYGKXKAKLSFEKMKAVEANEPGKliILvT 60 

Query: 61 AINPTPAGEGKSTMS IGLADALNKIGKKTMIALREPSLGPVMGI KGGAAGGGYAQVLPME 120 

AINPTPAGEGKSTMSIGIiADALN++GKKTM4-ALREPSLGPVMGIKGGAAGGGYAQVLPME 
Sbjct: 61 AINPTPAGEGKSTMSIGOffiALNQMGKKTMIAIiREPSLGPVMGIKGGAAGGGYAQVLPME 120 

Query: 121 DINLHFTGDMHAI TTANNALSALLDNH IHQGNELDI DQRRVIWKRWDLNDRALRQVI VG 180 

DIKLHFTGDMHAITTANNALSAL+DNH+ QGN+L ID RR+IWKRV+DLNDRALRQVIVG 
Sbjct: 121 DINLHFTGDMHAITTANMALSALIDNHLQQGlTOLGIDPRRIIWKRVLDIiNDRALRQVIVG 180 

Query: 181 LGSPWGIPREDGFDITVASEIMAILCLATDLSDLKKRLSNIWAYSRNRKPIYVKDLKI 240 

LGSPVNG+PREDGFDITVASEIMAILCLATDL DLKKRL++IWAY+ +RKP+YV+DLK+ 
Sbjct: 181 LGSPWGVPREDGFDIWASEIMAILCIATDLKDLKKRLADIWAYTYDRKPVYVRDLKV 240 

Query: 241 EGALTLILKDTIKPNLVQTIYGTPALVHGGPFAN3AHGCNSVLATSTALRLADYVVTEAG 300 

EGALTLILKD IKPNLVQTIYGTPAL+HGGPFANIAHGCNSVLATSTALRLADY VTEAG 
Sbjct: 241 EGALTLILKDAIKPNLVQTIYGTPALIHGGPFANIAHGCNSVIATSTALRLADYTVTEAG 300 

Query: 301 FGADLGAEKFLDIKTPNLPTSPDAIVIVATLRALKMHGGVSKEDLSQENVEAVKRGFTNL 360 

FGADLGAEKFL+IK PNLP +PDAIVIVATLRALKMHGGV+K DL+ EN EAV+ GF NL 
Sbjct: 301 FGADLGAEKFLNIKVPNLPKAPDAIVIVATLRALKMHGGVAKSDLAAENCEAVRLGFANL 360 

Query: 361 ERm/NNMRQYGVPVWAINQFTADTESEIATLKTLCSNIDVAVEIASVWEDGADGGLELA 420 

+RHV NMRQ+ VPVWAIN+F ADTE+EIATLK LC IV VELASVW +GA+GGL LA 
Sbjct: 361 KRHVEIWIRQFKVPVVVAINEFVADTEIAEIATLKMjCEEIK^ 420 

Query: 421 QTVANVIETQSSNYKRrjYNDEDTIEEKIKKIVTKIYGGNKVHFGPKAQIQLKEFSDNGWD 480 

+TV VI + ++++YKRLY+DEDT+EEK+ IVT+IYGG V FGPKA+ QLK+F++ GWD 
Sbjct: 421 KTVVRVIDQEAADYKRLYSDEDTLEEKVINIVTQIYGGKAVQFGPKAKTQLKQFAEFGWD 480 

Query: 481 KMPICMAKTQYSFSDNPNLLGAPTDFDITVREFVPKTGAGFIVALTGDVLTMPGLPECKPA 540 

K+P+CMAKTQYSFSDNP+LLGAPTDFDIT+REFVPKTGAGFIV LTGDV+TMPGLPK PA 
Sbjct: 481 KLPVCMAKTQYSFSDNPSLLGAPTDFDITIREFVPICTGAGFIVGLTGDvMTMPGLPKVPA 540 

Query: 541 ALNMDVLEDGTAIGLF 556 

A+ MDV E+GTA+GLF 
Sbjct: 541 AMAMDVAENGTALGLF 556 

A related DNA sequence was identified in S.pyogenes <SEQ ID 9057> which encodes amino acid sequence 
<SEQ ID 9058>. Analysis of this protein sequence reveals the following: 

Possible site: 13 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.49 Transmembrane 516 - 532 { 516 - 533) 

Final Results 

bacterial membrane - 
bacterial outside - 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS sequences follows: 

t 

389/555 (69%), Gaps = 2/555 (0%) 

Query: 4 SDIEIANSVTMEPISKVADQLGIDKEftLCLYGKYKAKIDARQLVALKNKPDGKLILVTAI 63 

+DIEIA SV ++PI+++ +Q+GI + 4 LYGKYKAK+ ++ A+K++ GKLILVTAI 
Sbjct: 3 TDIEIAQSVALKPIAEIVEQVGIGFDDIELYGKYKAKLSFDKIEAViCSQKVGKLILVTAI 62 

Query: 54 SPTPAGEGKTTTSVGLVDALSAIGKKAVIALREPSLXXXXXXXXXXXXXXXXXXXPMEDI 123 

+PTPAGEGK+T S+GL DAL+ IGKK +IALREPSL PMEDI 
Sbjct: S3 NPTPAGEGKSTMSIGLADALNKIGKKTMIALREPSLGPVMGIKGGAAGGGYAQVLPMEDI 122 

Query: 124 KLHFTGDFHAIGVAOTLIAALIDNHIHHGNSLGIDSRRITWKRVVDMNDRQLRHIVDGLQ 183 

NLHFTGD HAI ANN L+AL+DNHIH GN L ID RR+ WKRWD+NDR LR ++ GL 
Sbjct: 123 NLHFTGDMHAITTANNALSALLDNHIHQGNELDIDQRRVIWKRWDLNDRALRQVIVGLG 182 

Query: 184 GKVNGIPREDGYDITVASEIMAILCLSENISDIjKARLEKI I IGYNYQGEPVTXXXXXXXX 243 

VNGIPREDG+DITVASEIMAILCL+ ++SDLK RL I++ Y+ +P+ 
Sbjct: 183 SPVNGIPREDGFDITVASEIMAILCLATDLSDLKKRLSNIWAYSRNRKPIYVKDLKIEG 242 

Query: 244 XXXXX}LXXXIHPNLVQTLEHTPALIHGGPFANIAHGCNSVLATKIALKYGDYAVTEAGFG 303 

I PNLVQT4- TPAL+HGGPFANIAHGCNSVIAT AL+ DY VTEAGFG 
Sbjct: 243 ALTLILKDTIKPNLVQTIYGTPAIjVHGGPFANIAHGCNSVLATSTALRLADYVVTEAGFG 302 

Query: 304 ADLGAEKFIDIKCRMSGLRPAAVVLVATIRALKMHGGVPKADIATENVQAVVDGLPNLDK 363 

ADLGAEKF+DIK P A+V+VAT+RALKMHGGV K DL+ ENV+AV G NL++ 

Sbjct: 303 ADLG1AEKFLDIKTPNLPTSPDAIVIVATLRALKM0GGVSKEDLSQENVEAVKRGFTNLER 362 

Query: 364 HLANI QDVYGLPWVAINKFPIiDTDAELQAVYDACDKRGVDWI SDVWANGGAGGRELAE 423 

H+ N++ YG+PVWAIN+F DT++E+ + C V V ++ VW +G GG ELA+ 
Sbjct: 363 HVNNMRQ-YGVPWVAINQFTADTESEIATLKTLCSNIDVAVELASVWEDGADGGLEIAQ 421 

Query: 424 KVVTLAE-QDNQFRFVYEEDDS1ETKLTKIVTKVYGGKGINLSSAAKRELADLERLGFGN 482 
V + E Q + ++ +Y ++D+IE K+ KIVTK+YGG ++ A+ +L + G+ 

Query: 483 YPIC^KTQYSFSDDAKKLGAPTDFTVTISNLKVSAGAGFIVALTGAIMTMPGLPKVPAS 542 

PICMAKTQYSFSD+ LGAPTDF +T+ GAGFIVALTG ++TMPGLPK PA+ 

Sbjct: 482 MPICMAKTQYSFSDNPNLLGAPTDFDITVREFVPKTGAGFIVALTGDVLTMPGLPKKPAA 541 

Query: 543 ETIDIDEEGNITGLF 557 

+D+ E+G GLF 
Sbjct: 542 nNMDVLEDGTAIGLF 556 

SEQ ID 6198 (GBS131) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 6; MW 64.8kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 35 (lane 4; MW 90kDa). 

GBS131-GST was purified as shown in Figure 201, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2005 

A DNA sequence (GBSx2115) was identified in S.agalactiae <SEQ ID 6201> which encodes the amino 
acid sequence <SEQ ID 6202>. Analysis of this protein sequence reveals the following: 
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INTEGRAL Likelihood = -7.70 Transmembrane 90 - 106 ( 84 - 110)" 
INTEGRAL Likelihood = -1.97 Transmembrane 62 - 78 ( 62 - 78) 
INTEGRAL Likelihood = -0.69 Transmembrane 275 - 291 ( 275 - 291) 

5 Final Results 

bacterial membrane Certainty=0 .5012 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA88609 GB:M37842 unknown protein [Streptococcus mutans] 
Identities = 243/373 (65%), Positives = 302/373 (80%), Gaps = 1/373 (0%) 
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C FSPI+PF+STYYNYRDHRKI+VID V GG+NLADEYIN IE FG+WKD+ +M 



? L+FLQMWS T + APYL + + GYVIPY DSPLD +KVGENV 



YIDILN AR+YVYIMTPYLILDSE+EEA+QFAAERGVDV+IIMPGIPDK +P+ALAK Y 



AL +GVKIYE+ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6203> which encodes the amino acid 
sequence <SEQ ID 6204>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.86 Transmembrane 84 - 100 ( 81 - 104) 
INTEGRAL Likelihood = -8.33 Transmembrane 28 - 44 ( 23 - 49) 
INTEGRAL Likelihood = -6.74 Transmembrane 56 - 72 ( 53 - 74) 



Final Results 

bacterial membrane Certainty=0 .4545 (Affirmative) < suco 

bacterial outside --- Certainty=0.0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAA23240 GB:J02911 f ormyltetrahydrof olate synthetase (FTHFS) 

(ttg start codon) (EC 6.3.4.3) [Moorella thermoacetica] 
Identities = 350/557 (62%), Positives = 438/557 (77%), Gaps = 2/557 (0%) 

Query: 2 VLSDIEIANSVTMEPISKVADQLGIDKE^C^YGKYKAKIDARQLVALKNKPDGKLILVT 61 

V SDIEIA + M+P+ ++A LGI ++ + LYGKYKAKI LK+KPDGKLILVT 
Sbjct: 4 VPSDIEIAQAAKMK^VMELARGLGIQEDEVELYGK^KAKISLDVYRRLKDKPDGKLILVT 63 

Query: 62 AISPTPAGEGKTTTSVGLVDALSAIGKKAVIALREPSLGPVFGVKGGAAGGGHAQWPME 121 
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Query: 122 DINLHFTGDFHAIGVANNLIAALIDI'IEIHHGNSLGIDSRRITWKJ^VVDMNDRQliRHIVDG 181 

DINLHFTGD HA+ A+NLLAA++DNH+ GN L ID R ITW+RV+D+NDR IiR+IV G 
Sbjct: 124 DINLHFTGDIHAWYAHNLLAAMVDNKLQQGNVLNIDPRTITWRRVIDIiNDRALRNIVIG 183 

Query: 182 LQGKVNGIPREDGYDITVASEIMAILCLSENISDLKARLEKII1GYNYQGEPVTAKDLKA. 241 

L GK NG+PRE G+DI+VASE+MA LCL+ ++ DLK R +I++GY Y G+PVTA DL+A 
Sbjct: 184 LGGKANGVPRETGFDISVASEVMACLCLASDLMDLKERFSRIWGYTYDGKPVTAGDLEA 243 

Query: 242 GGAIAALLKDAIHPNLVQTLEHTPALIHGGPFANIAHGCNSVLATKLALKYGDYAVTEAG 301 

G++A L+KDAI PNLVQTLE+TPA IHGGPFANIAHGCNS++ATK ALK DY VTEAG 
Sbjct: 244 QGSMALLMKDAIKPNLVQTLENTPAFIHGGPFANIAHGCNSIIATKTALKLADYVVTEAG 303 

Query: 302 FGADLGAEKFIDIKCRMSGLRPAAWLVATIRALKMHGGVPKADLATENVQAWDGLPNL 361 

FGADLGAEKF D+KCR +G +P A V+VAT+RALKMHGGVPK+DLATEN++A+ +G NL 
Sbjct: 3 04 FGADLGAEKFYDVKCRYAGFKPDAWIVATVRALKMHGGVPKSDLATENLEALREGFANL 363 

Query: 362 DKHLANIQDVYGLPVWAINKFPLDTDAELQAVYDACDKRGVDWISDVWANGGAGGREL 421 

+KH+ NI +G+P WAIN FP DT+AEL +Y+ C K G +V +S+VWA GG GG EL 
Sbjct: 364 EKHIENI-GKFGVPAWAINAFPTDTEAELNLLYELCAKAGAEVALSEVWAKGGEGGLEL 422 

Query: 422 AEKW-TIAEQDNQFRFWEEDDSIETKLTKIVTKVYGGKGINLSSAAKRELADLERLGF 480 

A KV+ TL + + F +Y D SI+ K+ KI T++YG G+N ++ A + + E LG+ 
Sbjct: 423 ARKVLQTLESRPSNFHVLYNLDLSIKDKIAKIATEIYGADGVNYTAEADKAIQRYESLGY 482 

Query: 481 GJSTYPICI^KTQYSFSDDAKJ<IjGAPTDFTVTISNIjKVSAGAGFIVALTGAIMTMPGLPKVP 540 

GN P+ MAKTQYSFSDD KLG P +FT+T+ ++4-SAG IV +TGAIMTMPGLPK P 
Sbjct: 483 GNLPVVMAKTQYSFSDDMTKLGRPRNFTITVREVRLSAGGRLIVPITGAIMTMPGLPKRP 542 

Query: 541 ASETIDIDEEGNITGLF 557 

A+ IDID +G ITGLF 
Sbjct: 543 AACNIDIDADGVITGLF 559 

!GB:M37842 unknown protein [Streptococcus nrutans] (v. . . 517 e-145 



Query: 68 VLYLVNSDMDAISRMTWLILIMIAPLLGSLFLIYTKLDWGYRGLKQRINHLVDLSAPYLS 127 

VLYLVNS MD +S +TOL++I+ P+LG+LFLIYTK DWGYR LK I PY 
Sbjct: 5 VLYLVNSQMDTIiS I ITWLLVI LPFP I LGTLFLIYTKQDWGYRELKSL I KKSTQAI KPYFQ 64 

Query: 128 DDmiLEVLKDSTSTTYHLVQYLERSRGNFPIYNNTRVTYFPTGETFFDSLKEQLFLAKK 187 

D IL LK+S + TY+L QYL RS G FP+Y NT+VTYFP G++ F+ +K+QL A+K 
Sbjct: 65 YDQRILYKLKESHARTYNLAQYLHRS-GGFPVYKNTKVTYFPNGQSKFEEMKKQLLKAEK 123 

Query: 188 YIFLEFFIIAEGQMWGEILSILEKKVSEGVEVRVLFDGMNELSTLSSDYAKRLEQIGIKA 247 

+IFLE+FIIAEG MWGEILSILE+KV EGVEVRV++DGM ELSTLS DYAKRLE+IGIKA 
Sbjct: 124 FIFLEYFIIAEGLMWGEILSILEQKVQEGVEVRVMYDGMLELSTLSFDYAKRLEKIGIKA 183 

Query: 248 KSFLPISPFISTYYNYRDHRKIWIDGEVSFTGGINLADEYINEVERFGHWKDAGLMLEG 307 

K F PI+PF+STYYNYRDHRKI+VID +V+F GGINLADEYIN++ERFG+WKD +MLEG 
Sbjct: 184 KVFSPITPFVSTYYNYRDHRKILVIDNKVAFNGGINLADEYINQIERFGYWKDTAVMIiEG 243 

Query: 308 EATDSFLILFLQMWSITEKELIIDPYLSDHSLKLPSDGYVIPYGDSPLDTDKIGKNVYID 367 

E SF ++FLQMWS T K+ PYL+ + ++ ++GYVIPY DSPLD +K+G+NVYID 
Sbjct: 244 EGVASFTLMFLQMWSTTNKDYEFAPYLTQNFHEIVANGYVIPYSDSPLDHEKVGENVYID 3 03 

Query. 368 IIMI^YVYIMTPYLILDSEMEHALRFASERGVDIRIIMPGVPDKGVPYAI^AKTYYKAL 427 

ILN A++YVYIMTPYLILDSEMEHAL+FA+ERGVD++IIMPG+PDK VP+ALAK Y+ AL 
Sbjct: 304 ILNQARDYVYIMTPYLILDSEMEHALQFAftERGVDVKIIMPGIPDKKVPFALAKRYFPAL 363 



Query: 428 MSSGVKIYEY 437 

+ +GVKIYE+ 
Sbjct: 364 LDAGVKIYEF 373 

An alignment of the GAS and GBS proteins is shown below. 
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• Identities = 362/524 (69%), Positives = 437/524 (83%)- 

Query: 8 LISNKVKITOLLNKSKKSLLRGIFSRTTVIAILLILQLLFLLASYSWLEQYRVWLATVEH 67 

+ 1 K K+ LL+K K LRGIFSRTT+I +L+ILQL+FL SY+W+EQYRVW+ +E 
Sbjct: 2 IIKKKAKVKYLLHKGKHGFLRGIFSRTTIIVLLIILQLVFLFQSYAWMEQYRVWITILES 61 

Query: 68 ILTIGAVLYLWSEMDALSRVTWLILWIAPLLGAMFLMYTKFDWGYRGLKQRLETLIDE 127 

+ I VLYLVNS+MDA+SR+TWLIL+MIAPLLG++FL+YTK DWGYRGLKQR+ L+D 
Sbjct: 62 VFAITIVLYLVNSDMDAISRMTWLILIMIAPLLGSLFLIYTKLDWGYRGLKQRINHLVDL 121 

Query: 128 SQIYLEDDPETLNQLKSSTSTTYHLVQYFEKAHGNFPVYRNTDVTFLPTGEAFFEKMKEE 187 

S YL DD L LK STSTTYHLVQY E++ GNFP+Y NT VT+ PTGE FF+ +KE+ 
Sbjct: 122 SAPYLSDDDAILEVLKDSTSTTYHLVQYLERSRGNFPIYNNTRVTYFPTGETFFDSLKEQ 181 

Query: 188 LLIQVKKYIFLEFFIIDEGIMWGEILSILEQKVEEGVEVRILYDGMIEITKLSFDYTKRLE 247 

Ii AKKYIFLEFFII EG MWGEILSILE+KV EGVEVR+L+DGM E++ LS DY KRLE 
Sbjct: 182 LFIA.KKYIFLEFFIIAEGQNIWGEILSILEKKVSEGVEVRVLFDGMNELSTLSSDYAKRLE 241 

Query: 248 KIGIKAKAFSPISPFISTYYNYRDHRKIWIDGWGMTGGVNLADEYINHIELFGHWKDS 307 

+ IGIKAK+F PISPFISTYYNYRDHRKIWIDG V TGG+NLADEYIN +E FGHWKD+ 
Sbjct: 242 QIGIKAKSFLPISPFISTYYNYRDHRKIWIDGEVSFTGGINLADEYINEVERFGHWKDA 301 

Query: 308 GIMLKGKAVDSFLLLFLQMWSITEEKMLVAPYLGVHDDLVENEGYVIPYGDSPLDTDKVG 367 

G+ML+G+A DSFL+LFLQMWSITE+++++ PYL H + ++GYVI PYGDSPLDTDK+G 
Sbjct: 302 GLMLEGEATDSFLILFLQMWS ITEKELI IDPYLSDHSLKLPSDGYVI PYGDSPLDTDKIG 361 

Query: 368 ENVYIDILNHAREYVYIMTPYLILDSELEHAIQFAAERGVDVRIIMPGIPDKPIPYALAK 427 

+NVYIDILNHA+EYVYIMTPYLILDSE+EHA++FA+ERGVD+RIIMPG+PDK 4-PYALAK 
Sbjct: 362 KNVYIDILNHAKEYVYIMTPYLILDSEMEHALRFASERGVDIRIIMPGVPDKGVPYALAK 421 

Query: 428 TYYQALTKSGVKIYEYTLGFVHSKIFLSDNTKAWGTINLDYRSLYHHFECAVYLYKVDA 487 

TYY+AL SGVKIYEY GFVHSK+F+SDNTKAWGTINLDYRSLYHHFECA YLY+V 
Sbjct: 422 TYYKALMSSGVKIYEYQPGFVHSKVFISDNTKAWGTIKLDYRSLYHHFECATYLYRVSV 481 

Query: 488 IQDIYRDYMDTLNKSRLVSLKDINNIPKFQKVIGIVTKTIAPLL 531 

I DI D+ + +S L++ + P +QK+IG++ + IAPLL 
Sbjct: 482 IADIVNDFHEAQKQSLLMTSDHLTQREWYQKLIGLLVRIIAPLL 525 

A related GBS gene <SEQ ID 8953> and protein <SEQ ID 8954> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 6 

McG: Discrim Score: -8.80 

GvH: Signal Score (-7.5): -1.94 
Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

ALOM program count: 4 value: -10.03 threshold: 0.0 

INTEGRAL Likelihood =-10.03 Transmembrane 34 - 50 ( 29 - 56) 
INTEGRAL Likelihood = -7.70 Transmembrane 90 - 106 ( 84 - 110) 
INTEGRAL Likelihood = -1.97 Transmembrane 62 - 78 ( 62 - 78) 
PERIPHERAL Likelihood = 1.22 199 
modified ALOM score: 2.51 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5012 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

32.5/57.2% over 498aa 

Bacillus firmus 

SP|O66043| CARDIOLIPIN SYNTHETASE (EC 2.7.8.-) (CARDIOLIPIN SYNTHASE) (CL SYNTHASE). 
Insert characterized 

GP|2952028|gb|AAC05444.l| |U88888 cardiolipin synthase Insert characterised 
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ORF01572(409 - 1893 of 2193) 

SP|O66043|CLS_BACFI(5 - 503 of 503) CARDIOLIPIN SYNTHETASE (EC 2.7.8.-) (CARDIOLIPIN 
SYNTHASE) (CL SYNTHASE). GP| 2952028 |gb|AAC05444 . l| |U88888 cardiolipin synthase {Bacillus 

%Match =17.9 

%Identity =32.5 %Similarity =57.1 

Matches = 1S2 Mismatches = 204 Conservative Sub.s = 123 

153 183 213 243 273 303 333 363 

NLQLSIWMF*KTVQPLDYFK* *RGRACDASLFLLGIRF*LEI I *NNRMLFK*QYAIIK*LIWRGEKLISNICVKIVRLIjNK 

393 423 447 477 507 528 558 588 

SKKSLLRGIFSRTTVIAIliLILQLLF- -LLASYSWLEQYRVWLATVEHILT IGAVLYLVNSEMDALSRVTWLILvMI 

: =| : IN | | =|: = | :| |: || :::: | :|||=== 
MKNR1MVIAFFALLFAALYISRGFLQSWMVGTLSWFTLSVIFIGIIIFFEN--RHPTKTLTWLLV1AA 



20 APLLGAMFLMYTKFDWGYRGLKQRLETLIDESQIYLE-DDPETraQLKSSTSTTYHLVQYFEKAH--GNFPVYRNTDVTF 
|::| |: | | :| |: : |:: : : : : ||: : | | || | |: ::: 

FPWG- -FFFYLMFGQITORKSKRFSKKAIEDERAFQKIEGQRQLNE-EQLKKMGGHQQLLFRLAHKLGKNPISFSSETKV 
80 90 100 110 120 130 140 

25 849 879 909 939 969 999 1029 1059 

LPTGFAFFEKMKEELLKAKKYIFLEFFIIDEGIMWGEILSILEQKVEEGVEVRILYDGMIEITKLSFDYTKRLEKIGIKA 
I |: : : : I |: :| ||::|: : : I I I I =111 II II I I = III I = I |: = 
LTDGKETYAHILQALKMAEHHIHLEYYIVRHDDLGNQIKDILISKAKEGVHVRFLYDG-VGSWKLSKSYVEELRDAGVEM 
160 170 180 190 200 210 220 

30 

1086 1116 1146 1176 1206 1236 1266 1293 

KAFSPIS-PFISTYYNYRDHRKIWIDGWGMTGGWL?J3EYINHIELFGHWtCDSGIMLKGKAVDSFLLLFLQNlWSI-TE 

= 111= ll = = llhlllhlllllll 11 = 1= 111= 11 = 1 = 1= = = = l = ll == 1 = 111 I I 

VSFSPVKLPFLTHTINYRNHRKIIVIDGWGFVGGLNIGDEYLGKDAYFGYWRDTHLYVRGEAVRTLQLIFLQDWHYQTG 
35 240 250 260 270 280 290 300 

1323 1353 1383 1413 1443 1473 1503 1533 

EKMLVAPYLGVHDDLVENEGWIPYGDSPLDTDKVGEIWYIDILNHAREYVYIMTPYLILDSELEHAIQFAAERGVDVRI 
| =| || = : :| | | :| : :: :: |:: ==| :||:| | := |== || |:|||| 

40 ETILNQTYLSPSLSMTKGDGGVQMIASGPDTRWEVNKKLFFSMITSAKKSIWIASPYFIPDDDILSALKIAALSGIDVRI 
320 330 340 350 360 370 380 

1563 1593 1623 1653 1683 1713 1743 1773 

IMPGIPDKPIPYALAKTYYQALTKSGVKIYEYTLGFVHSKI FLSDNTKAVVGTINLDYRSLYHHFECAVYLYKVDAIQDI 
45 ::| HI | : :::|: | ::|||:||| ||:|||| : |: | :|| |:| ||:: :|| |||: :: : 

LVPNRPDKRIVFHASRSYFPELLEAGVTOTEYNRGFMHSKIIITOHEIASIGTSNMDMRSFHLNFEWAYLYRTSSVTKL 
400 410 420 430 440 450 460 

1803 1833 1863 1893 1923 1953 1983 2013 

50 YRDY^TIJSIKSRLVSLKDINNIPKFQKVIGIVTKTIAPLL*K*FIFNLILKVN*RI*LYLKSKGCILTKLC*TTVMR*TO 
11= I I = = = II h = :| == = = 111 
VSDYVYDLEHSNQINFSLFKNRPFFHRLIESTSRLLSPLL 
480 490 500 

SEQ ID 8954 (GBS277d) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
55 extract is shown in Figure 150 (lane 18; MW 51kDa), in Figure 151 (lane 17 & 18; MW 51kDa) and in 
Figure 182 (lane 12; MW 51kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE 
analysis of total cell extract is shown in Figure 151 (lane 15 & 16; MW 761cDa) and in Figure 58 (lane 5; 
MW 87kDa). 

GBS277d-His was purified as shown in Figure 235, lane 8. 

60 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2006 

A DNA sequence (GBSx2116) was identified in S.agalactiae <SEQ ID 6205> which encodes the amino 
acid sequence <SEQ ID 6206>. This protein is predicted to be aspartate-semialdehyde dehydrogenase. 
Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 

Pinal Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9831> which encodes amino acid sequence <SEQ ID 9832> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA26850 GB:J02667 aspartate beta-semialdehyde dehydrogenase (EC 
1.2.1.11) [Streptococcus mutans] 
Identities = 261/357 (73%) , Positives = 304/357 (85%) , Gaps = 1/357 (0%) 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sb j ct : 


121 




181 


Sbjct: 






241 


Sbjct: 


241 






Sbjct: 


301 



MGYTVAIVGATGAVGT+MI +QLEQS LP+++V+LLS3SRSAGK+L +KD+ + VE TTK+ 



GIIACPNCSTIQMM+ALEPIRQKWG+ RVIVSTYQAVSG+G A+ ET ++++V+ND + 



A CVR+P+L HSE++YIETK++A I E+K AIA FPGAVL+D QIYPQA NAVG R 
AHCVRVPILFSHSEAVYIETKDVAPIEEVKAAIAAFPGAVLEDDIKHQIYPQAANAVGSR 3 00 

ETFVGRIRKDLDQENGVHMWWSDNLLKGAAWNSVQIAETLHICNGLVKPAKELKFEL 357 

TFVGRI RKDLD ENG+HMWWSDNLLKGAAWNS+ A LH+ GLV+ ELKPEL 
- TFVGRI RKDLDI ENGIHMWWSDNLLKGAAW1IS 1 1 TANRLHERGLVRSTSELKFEL 356 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2007 

A DNA sequence (GBSx2117) was identified in S.agalactiae <SEQ ID 6207> which encodes the amino 
acid sequence <SEQ ID 6208>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.66 Transmembrane 33 - 49 ( 33 - 49) 

Final Results 

bacterial membrane Certainty=0. 2062 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 500. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2008 

A DNA sequence (GBSx2119) was identified in S.agalactiae <SEQ ID 6209> which encodes the amino 
acid sequence <SEQ ID 621 0>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3853 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Mot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2009 

A DNA sequence (GBSx2120) was identified in S.agalactiae <SEQ ID 621 1> which encodes the amino 
acid sequence <SEQ ID 6212>. This protein is predicted to be unnamed protein product (clpP). Analysis of 
this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3883 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10061> which encodes amino acid sequence <SEQ ID 
10062> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6213> which encodes the amino acid 
sequence <SEQ ID 6214>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 26 82 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 175/196 (89%) , Positives = 187/196 (95%) 
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Query: 5 MIPWIEQTSRGERSYDIYSRLLKDRIIMLTGQVED^I4AKSIIAQLLFLDAQDNTKDiyL 64 

MIPWIEQTSRGERSYDIYSRLLKDRIIMLTG VEDNMANS+IAQLLFLDAQDNTKDIYL 
Sbjct: 1 MIPWIEQTSRGERSYDIYSRLLKDRIIMLTGPVEDNMANSVIAQLLFLDAQDNTKDIYL 60 

Query: 65 YVOTPGGSVSAGI^IVDTMNFIKSDVQTIVMGMAaSMGTIIASSGAKGKRFMLPNAEYMI 124 

YWTPGGSVSAGLAIVDTMNFIK+DVQTIVMGMAASMGT+IASSG KGKRFMLPNAEYMI 
Sbjct: 61 YVIWPGGSVSAGI^IVDT^FIKADVQTIVMGMAASMGTVIASSGTKGKRFMLPNAEYMI 12 0 

Query: 125 HQPMGGTGGGTQQSDMAIAAEHLLKTRHTLEKILADNSGQSIEKVHDDAERDRWMSAQET 184 

HQPMGGTGGGTQQ+DMAIAAEHLLKTRH LEKIIjA N+G++I+++H DAERD WMSA+ET 
Sbjct: 121 HQPMGGTGGGTQQTDMAIAAEHLLKTRHRLEKILAQNAGKTIKQIHKDAERDYWMSAEET 180 

Query: 185 LDYGFIDAIMENNNLQ 200 

L YGFID IMENN L+ 
Sbjct: 181 LAYGFIDEIMENNELK 196 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2010 

A DNA sequence (GBSx2121) was identified in S.agalactiae <SEQ ID 6215> which encodes the amino 
acid sequence <SEQ ID 6216>. This protein is predicted to be uracil phosphoribosyltransferase (upp). 
Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 127 - 143 ( 127 - 144) 
INTEGRAL Likelihood = -0.06 Transmembrane 72 - 88 ( 72 - 89) 
INTEGRAL Likelihood = -0.06 Transmembrane 154 - 170 ( 154 - 170) 

Final Results 

bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10063> which encodes amino acid sequence <SEQ ID 
10064> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA26890 GB:L07793 uracil phosphoribosyltransferase 

[Streptococcus salivarius] 
Identities = 192/209 (91%) , Positives = 202/209 (95%) 

Query: 1 MGKFQVISHPLIQHKLSILRRTTTSTKDFRELVDEIAMLMGYEVSRDLPLEDVEIQTPVA 60 

MGKFQVISHPLIQHKLSILRR TSTKDFRELV+EIAMLMGYEVSRDLPLE+VEIQTP+ 
Sbjct: 1 MGKFQVISHPLIQHKLSILRREDTSTKDFRE1VNEIAMLMGYEVSRDLPLEEVEIQTPIT 60 

Query: 61 TWQKQLAGKKLAIVPILRAGIGMVDGFLSLVPAAKVGHIGMYRDEETFQPVEYLVKLPE 120 

WQKQL+GKKIAIVPILRAGIGMVDGFLSLVPAAKVGHIGMYRDEET +PVEYLVKLPE 
Sbjct: 61 KTVQKQLSGKKLAIVPILRAGIGMVDGFLSLVPAAKVGHIGMYRDEETLEPVEYLVKLPE 120 

Query: 121 DIDQRQIFWDPMLATGGSAILAVDSLKKRGAASIKFVCLVAAPEGVAALQEAHPDVDIY 180 

DIDQRQIFVVDPMLATGGSAILAVDSLKKRGAA+IKFVCLVAAPEGV LQ+AHPD+DIY 
Sbjct: 121 DIDQRQIFVVDPMIATGGSAILAvDSLKKRGAftNIKFVCLVAAPEGVKKLQDAHPDIDIY 180 

Query: 181 TAALDEKLNEHGYIVPGLGDAGDRLFGTK 209 

TA+LDEKLNE+GYIVPGLGDAGDRLFGTK 
Sbjct: 181 TASLDEKLNENGYIVPGLGDAGDRLFGTK 209 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6217> which encodes the amino acid 
sequence <SEQ ID 6218>. Analysis of this protein sequence reveals the following: 
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Possible site: 26 
•> Seems to have no N- terminal signal sequence 
INTEGRAL Likelihood = -0.59 Transmembrane 
INTEGRAL Likelihood = -0.22 Transmembrane 



Final Results 

bacterial membrane Certainty=0 . 1235 (Affirmative) < succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein is similar to uracil phosphoribosyltransferase from S.salivarius: 





1 


Sbjct: 


1 




61 


Sbjct: 


61 


Query: 


121 


Sbj ct: 


121 


Query: 


181 


Sbj ct : 


181 



MGK QVI SHPLIQHKLS I LRR+ TSTKDFRELVNEIAMLMGYEVSRDLPLE+V+IQTP++ 



DI+QRQIF+VT3PMLATGGSA1IAVDSLKKRGAANIKFVCLVAAPEGVKKLQ+AHPDIDI+ 



TA+LD+ LNE+GYIVPGLGDAGDRLFGTK 
Sbjct: 181 TASLDEKLNENGYIVPGLGDAGDRLFGTK 209 
/ 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 190/209 (90%), Positives = 201/209 (95%) 

Query: 1 MGKFQVISHPLIQHKLSILRRTTTSTKDFRELVDEIAMLMGyEVSRDLPLEDVEIQTPVA 60 

MGK QVISHPLIQHKLSILRR TTSTKDFRELV+EIAMLMGYEVSRDLPLEDV+IQTPV+ 
Sbjct: 1 MGKCQVISHPLIQHKLSIIiRRQTTSTKDFRELvNEIAMLMGYEVSRDLPLEDVDIQTPVS 6 0 

Query: 61 TTVQKQLAGKI<IiAIVPILRAGIGMVDGFLSLVPAAKVGHIGMYRDEETFQPVEYLVKLPE 120 

TVQKQLAGKKLAIVPILRAGIGMVDG LSLVPAAKVGH IGMYR+EET +PVEYLVKLPE 
Sbjct: 61 KTVQKQI^GKKLAIVPILRAGIGMVDGLLSLVPAAKVGHIGMYRNEETLEPVEYLVKLPE 120 

Query: 121 DIDQRQIBW1PMLATGGSAIIAVDSLKKRGAASIKFVCLVAAPEGVAALQEAHPDVDIY 180 

D I + QRQ I F+VD PM LATGGS A I LAVDS LKKRGAA+ 1 KFVCLVAAPEGV LQEAHPD+DI + 
Sbjct: 121 DINQRQIFLVDPMLATGGSAILAVDSLKKRGAANIKFVCLVAAPEGVKKLQEAHPDIDIF 180 

Query: 181 TAALDEKMffiHGYIVPGLGDAGDRLFGTK 209 

TAALD+ LNEHGYIVPGLGDAGDRLFGTK 
Sbjct: 181 TAALDDHLNEHGYIVPGLGDAGDRLFGTK 209 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2011 

A DNA sequence (GBSx2122) was identified in S.agalactiae <SEQ ID 6219> which encodes the amino 
acid sequence <SEQ ID 6220>. This protein is predicted to be hemolysin (patB). Analysis of this protein 
sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -3.29 Transmembrane 88 - 104 ( 86 - 106) 

Final Results 
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bacterial membrane Certainty=0 .2317 (Affirmative; 

bacterial outside Certainty=0 . 0000 (Not Clear) ■ 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) > 



5 The protein has homology with the following sequences in the GENPEPT database. 



DFTSLPERFSSNTIKWKAVQK DQEILPLWIADMDFPIFPEMSEAIEDFSHQMVFGYD 61 

+F ER + ++KW + + LP+W+ADMBF ++EA+++ +FGY 

NFDKREERLGTQSVKWDKTGELFGVTDALPMWVADMDFRAPEAITEALKERLDHC3IFGYT 6 1 



HG++ + +S+ GW A+S+A+QAFT+ GD V++ PVY P 



4 IDF- L3 + + +V L+I C+PHNP GR W++ 4 



Query: 


5 


Sb j ct : 


2 


Query: 


62 


Sbjct: 


62 


Query: 


122 


Sbj Ct : 






182 


Sbjct: 


182 




242 


Sbjct: 


242 




3 02 


Sbjct: 


302 




361 


Sbjct: 


362 



V V +VSDEIH DL+L+ + 



I- A E A++K WL L 



L T+ +K+MKP+ +YL+WLDFSAY L+ E+Q+++ K+IL G +G G+ 



There is also homology to SEQ ID 1006. 

SEQ ID 6220 (GBS392) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (l ane 2; MW 46.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 83 (lane 5; MW 71kDa). 

GBS392-GST was purified as shown in Figure 217, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 2012 

A DNA sequence (GBSx2123) was identified in S.agalactiae <SEQ ID 6221> which encodes the amino 
acid sequence <SEQ ID 6222>. This protein is predicted to be rRNA methylase, SpoU family (cspR). 
Analysis of this protein sequence reveals the following: 



I- terminal signal i 



Final Results 

bacterial cytoplasm Certainty=0. 1436 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 
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Query: 19 HIVLFEPQIPANTGNIARTCAATNAPLEIIRPMC-FPIDDKKMKRAGLDYWDKLDVSFYDG 78 

H+VL++P+IPANTGNIARTCAA.TK LH+IRP4GF DDK +KRAGBDYW+ ++V ++D 
Sbjct: 4 HVVLYQPEIPAOTGNIARTCMTOTTI^LIRPIXSFSTDDKMLKRAGLDY^FVNVVYHDS 63 

Query: 79 LEE-FMLSCRGKVHLISKFADKVYSDENYND-DQDHYFMFGREDKGLPETFMREHAEKAL 136 

LEE F +GK I+KF + ++ +Y D D+D++F+FGRE GLP+ ++ + ++ L 
Sbjct: 64 LEELFFJYYKKGKFFFITKFGQQPHTSFDYTDLDEDYFFVFGRETSGLPKDLIQNNMDRCL 123 

Query: 137 RIPMITOEHVRSLNVSNTVCMIVYEALRQQSFPNLE 171 

R+PM EHVRSLN+SNT ++VYEALRQQ++ +L+ 
Sbjct: 124 RLPMT-EHTOSLNLSNTAAILVYEALRQQNYRDLK 157 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6223> which encodes the amino acid 
sequence <SEQ ID 6224>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0 .2236 (Affirmative) < succj 

bacterial membrane Certainty=0 . 0000 {Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
Identities = 135/182 (74%) , Positives = 150/182 (82%) 

Query: 1' MIETLTQKNHRSDSGPJSIHIVLFEPQIPAI^TGNIARTCAATNAPLHIIRPMGFPIDDKKM 60 

M + L KN + RNHIVLF+PQIP NTGNIARTCAATNAPLHI I +PMGFPIDD+KM 
Sbjct: 13 MTTKEL INKNDKVKKARNH I VLFQPQI PQNTGNI ARTCAATNAPLHI I KPMGFP I DDRKM 72 

Query: 61 KRAGLDYWDKLDVSFYDGLEEFMLSCRGKVHLISKFADKVYSDENYNDDQDHYFMFGRED 120 

KRAGLDYWDKL++ FYD LE+F+ C G++HLISKFA YS YD HYF+FGRED 
Sbjct: 73 tCRAGLDYWDKLELHFYDHLEQFINQCHGQLHLISKFAVNNYSQATYADGDSHYFLFGRED 132 

. Query: 121 KGLPETFMREHAEKALRIPMTOEHWSIJWSNWCTIVYFALRQQSFPNLELSHTYENDK 180 
GLPE FMREHAEKALRIPMNDEHVRSLNVSNTVCM++YEALRQQ F LEL HTYE+DK 
Sbjct: 133 TGLPEDFMREHAEKALRIPMiroEHVRSI^SNTVCMVIYEALRQQGFQGLELKHTYEHDK 192 

Query: 181 LK 182 
LK 

Sbjct: 193 LK 194 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 

Example 2013 

A DNA sequence (GBSx2124) was identified in S.agalactiae <SEQ ID 6225> which encodes the amii 
acid sequence <SEQ ID 6226>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.79 Transmembrane 82 - 98 ( 69 - 100) 

INTEGRAL Likelihood = -6.48 Transmembrane 27 - 43 ( 24 - 47) 

INTEGRAL Likelihood = -5.52 Transmembrane 132 - 148 ( 126 - 151) 

INTEGRAL Likelihood = -5.10 Transmembrane 162 - 178 ( 161 - 185) 

Final Results 

bacterial membrane Certainty=0 .3718 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 941 1> which encodes amino acid sequence <SEQ ID 9412> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

5 >GP:CAB13143 GB:Z99110 similar to amino acid permease [Bacillus subtilis] 

Identities = 46/143 (32%), Positives = 81/143 (56%), Gaps = 1/143 (0%) 

Query: 3 FAYDGWTII^IAPEVKNPKKNLPLAFVIGPALILLSYIAFFYGLTQILGASFIMTTGND 62 
FAYDGW + + E+KNP+K LP A G ++ Y+ + L 1L A+ I+T G + 
10 Sbjct: 203 FAYDGWILIAALGGEMKNPEKLLPRAMTGGLLiIVTAIYIFINFALLHIIiSAIIEIVTLGEN 262 

Query: 63 AINYAANIIFGPSVGRLLSFIVILSVLGVANGLLLGTMRLPQAFAERGWIK-SERMANIN 121 

A + AA ++FG G+L+S +I+S+ G NG +L R+ A AER + +E++++++ 
Sbjct: 263 ATSTAATMLFGS IGGKLI SVGI IVS I FGCLNGKVLSFPRVSFAMAERKQLPFAEKLSHVH 322 

15 

Query: 122 LKYQMSLPASLTVTAVAI FWLFV 144 

++ A A+A+ + + 

Sbjct: 323 PSFRTPWIAISFQIALALIMMLI 345 

20 There is also homology to SEQ ID 3 1 14. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2014 

A DNA sequence (GBSx2125) was identified in S.agalactiae <SEQ ID 6227> which encodes the amino 
25 acid sequence <SEQ ID 6228>. Analysis of this protein sequence reveals the following: 
Possible site: 20 

>» Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 1849 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9439> which encodes amino acid sequence <SEQ ID 9440> 
35 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD23454 GB:AF117741 cochaperonin GroES [Streptococcus pneumoniae] 
Identities = 31/52 (59%) , positives = 42/52 (80%) 

40 Query: 2 GDGIRTLTGELVAPSVAEGDTVI,VENGAGLEvT<DGNEKVTVVRESDIVAvVK 53 

G G+RTL G+LVAPSV GD VLVE AGL+VKDG+EK +V E++I+A+++ 
Sbjct: 42 GQGTOTLNGDLVAPSVKTGDRVLVEAHAGLDVKDGDEKYIIVGEANILAIIE 93 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6229> which encodes the amino acid 
45 sequence <SEQ ID 623 0>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0 .3290 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 29/49 (59%) , Positives = 39/49 (79%) 

Query: 4 G I RTLTGELVAP S VAEGDTVL VENGAGLEVKDGNE KVTWRE SDI VAW 52 

G+RT+TG+ V PSV+ G VLVENG LEV +EKV+++RESDI+A+V 
Sbjct: 60 GWTITGDSVLPSVSVGQEVLVENGHDLEVTVDDEKVSIIRESDIIAIV 108 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2015 

A DNA sequence (GBSx2126) was identified in S.agalactiae <SEQ ID 6231> which encodes the amino 
acid sequence <SEQ ID 6232>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1272 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD23455 GB:AF117741 chaperonin GroEL [Streptococcus pneumoniae] 
Identities = 472/539 (87%), Positives = 513/539 (94%), Gaps = 1/539 (0%) 

Query: 1 MAKDIKFSADARSAMVRGVDIIADTVKVTLGPKGRNVVLEKAFGSPLITiroGVTIAKEIE 60 

M+ K+ 1 KFS+DARSAMVRGVDI LADTVKVTLGPK RNWLEK+FGSPLITNDGVTIAKEIE 
Sbjct: 1 MSKEIKFSSDARSA^WRGVDIIlADTVKVTLGPKDR^^VVLEKSFGSPLITlroGvTIAKEIE 60 

Query: 61 LEDHFEI^GAKLVSEVASKTiroiAGDGTTTATVLTC^ITOEGLKNVTAGANPIGIRRGIE 120 

LEDHFENMGAKLVSE+ASKTNDIAGDGTTTATVLTQAITOEG+KNVTAGANPIGIRRGIE 
Sbjct: 61 LEDHFENMGAKLVSEIASKTISIDIAGDGTTTATVLTQAIVREGIKNVTAGANPIGIRRGIE 120 

Query: 121 TAVSAAVEELKEIAQPVSGKEAIAQVAAVSSRShKVGEYISEAMERVGNDGVITIEESRG 180 

TAV+AAVE LK A PV+ KEAI +QVAAVSSRSEKVGEYI SEAME+VG DGVITTEESRG 
Sbjct: 121 TAVAAAVEALKNNAIPVANKEAISQVAAVSSRSEICVGEYISEAMEKVGKDGVITIEESRG 180 

Query: 181 METELEWEGMQFDRGYLSQYMVTDNEKMVSELENPYILITDKKISNIQEILPLLEEVLK 240 

METELEWEGMQFDRGYLSQYtWTD+EKMV++LENPYILITDKKISNIQEILPLLE +L+ 
Sbjct: 181 METELEWEGMQFDRGYLSQYMVTDSEKI1VADLENPYI LI TDKKI SNI QE I LPLLES I LQ 240 

Query: 241 TITOPLLIIADDVDGEALPTLVI^KIRGTFNWAVKAPGFGDRRKAMLEDIAILTGGTVVT 300 

+NRPLLIIADDVDGFJU^PTLVLNKIRGTFNVVAVKAPGFGDRRKAMLEDIAILTGGTV+T 
Sbjct: 241 SmPLLIIADDVDGlALPTLVLNKIRGTFNWAVKAPGFGDRRKAMLEDIAILTGGTVIT 300 

Query: 301 EDLGLDLKDATMQVLGQSAKVTVDKDSTVIVEGAGDSSAIANRVAI I KSQMEATTSDFDR 360 

EDLGL+LKDAT++ LGQ+A+VTVDKDSTVIVEGAG+ AI++RVA+IKSQ+E TTS+FDR 
Sbjct: 301 EDLGLELKDATIEALGQAARVTVDKDSTVIVEGAGNPEAISHRVAVIKSQIETTTSEFDR 360 

Query: 361 EKLQERIjAKLAGGVAVIKVGAATETELKEMKLRIEDAI^ATRAAvEEGIVSGGGTALV^IV 420 

EKLQERLAKL+GGVAVIKVGAATETELKEMKLRIEDAIiNATRAAVEEGIV+GGGTAL NV 
Sbjct: 361 EKLQERLAKLSGGVAVIKVGAATETELKEMJO^IEDAI^ATRAAVEEGIVAGGGTALANV 420 

Query: 421 IEKVAALKIMGDEETGRNIVLRALEEPVRQIAYNAGYEGSVIIERLKQSEIGTGFNAANG 480 

I A L+L GDE TGRNIVLRALEEPVRQIA+NAG+EGS++I+RLK +E+G GFNAA G 
Sbjct: 421 IPAEATLELTGDEATGRNIVLRALEEPVRQIAHNAGFEGSIVIDRLKNAELGIGFNAATG 480 

Query: 481 EWVD^WTTGIIDPvKVTRSALQNAASVASLILTTEAvVANKPEPEAPTAPAMr)PSMMGG 539 

EWV+M+ GIIDPVKV+RSAl^NAASVASLILTIBAVVANKPEP AP APAMDPSMMGG 
Sbjct: 481 EWVMyiID®3IIDPVKVSRSALQNAASVASLILTTEAVVANKPEPVAP-APAMDPSMMGG 538 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 623 3> which encodes the amino acid 
sequence <SEQ ID 6234>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.1070 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 491/543 (90%) , Positives = 515/543 (94%) , Gaps = 3/543 (0%) 

Query: 1 MAKDIKFSADARSAMTOGVDILADTVOT 60 
1 5 MAKDIKFSADAR+AMTOGTO+LADTOOT 

Sbjct: 3 MAKDIKFSADAPAAMWGVDMLADTVKOT 62 

Query: 61 LEDHFENMGAKLVSEVASKTNDIAGDGTTTATVnTQAITOEGLKNVTAGANPIGIRRGIE 120 
LEDHFENMGAKLVSEVASKTNDIAGDGTTTATVLTQAIV EGLKNVTAGANPIGIRRGIE 
20 Sbjct: 63 LEDHFENMGAKLVSEVASKTNDIAGDGTTTATVLTQAIVHEGLKNUTAGANPIGIRRGIE 122 



Query: 121 TAVSAAVEELKEIAQPVSGKEAIAQVAAVSSRSEKVGEYISEAMERVGNDGVITIEESRG 180 

TA + AVE LK IAQPVSGKEAIAQVAAVSSR3EKVGEYISEAMERVGNDGVITIEESRG 
Sbjct: 123 TATATAVEALICAIAQPVSGICEAIAQVAAVSSRSEKVGEYISEAMERVGNDGVITIEESRG 182 

Query: 181 METELEWEGMQFDRGYLSQYMVTDNEKMVSELENPYILITDICKISNIQEILPLLEEVLK 240 

METELEWEGMQFDRGYLSQYMVTDNEKMV+ +LE!NP+ IL1 TDKK+ SNIQ+ 1 LPLLEEVLK 
Sbjct: 183 MBTELE WEGMQFDRGYLSQYMVTDNEKMVftDLENPFILITDKKVSNIQD I LPLLEEVLK 242 

Query: 241 TNRPLLlIADDVDGEALPTLVIjNKIRGTFNWAVKAPGFGDRRKAMLEDIAILTGGTVVT 300 

TIffiPLLIIADDVDGEALPTLVIjNKIRGTFNWAVKAPGFGDRRKAMLEDIAILTGGTV+T 
Sbjct: 243 TmPLLIIADDVDGEALPTLVLNKIRGTFNWAVKAPGFGDRRKAMLEDIAILTGGTVIT 302 

Query: 301 EDLGLDLIOJATMQVLGQSAKVTvDKDSTVIVEGAGDSSAIANRVAIIKSQMEATTSDFDR 360 

EDLGL+LKDATM LGQ+AK+TVDKDSTVIVEG+G S AIANR+A+IKSQ+E TTSDFDR 
Sbjct: 303 EDLGLELKDATMTALGQAAKITVDKDSTVIVEGSGSSEAIANRIALIKSQLETTTSDFDR 362 

Query: 361 EKLQERIAKIAGGVAVIKVGAATETELKEMKLRIEDALNATRAAVEEGIVSGGGTALVNV 420 

EKLQERLAKLAGGVAVIKVGA TET LKEMKLRIEDALNATRAAVEEGIV+GGGTAL+ V 
Sbjct: 363 EKLQERIJU0LAGGVAV1KVGAPTETALKEMKLRIEDALNATRAAVEEGIVAGGGTALITV 422 

Query: 421 IEKVAALKLNGDEETGRlsTIVLRALEEPVRQIAYNAGYEGSVIIERLKQSEIGTGFNAANG 480 

IEKVAAL4-L GD+ TGRNIVLRALEEPVRQIA NAGYEGSV+I++LK S GTGFNAA G 
Sbjct: 423 IEKVAALELEGDDATGRNIVLRALEEPVRQIALNAGYEGSWIDKLKNSPAGTGFNAATG 482 

Query: 481 EWVDMVTTGIIDPVKVTRSALOTAASVASLIL^ 537 

EWVDM+ TGIIDPVKVTRSALQNAASVASLILTTEAWANKPEP AP PA MDP MM 
Sbjct: 483 EWVDMIKTGIIDPVKVTRSALQNAASVASLILTTEAVVANKPEPATPAPAMPAGMDPGMM 542 

Query: 538 GGF 540 
GGF 

Sbjct: 543 GGF 545 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2016 

A DNA sequence (GBSx2127) was identified in S.agalactiae <SEQ ID 6235> which encodes the amino 
acid sequence <SEQ ID 623 6>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty= 0.3 2 16 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10247> which encodes amino acid sequence <SEQ ID 
10248> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06113 GB:AP001515 transcriptional regulator (GntR family) 
[Bacillus halodurans] 
Identities = 50/171 (29%) , Positives = 86/171 (50%) , Gaps = 17/171 (9%) 

Query: 21 HVQVYNKIFNMIQDGTYSPGMQLPSEPELAGQLNVSRa.TLRKSLALLQEDHLVKNIRGKG 80 

++QV +K+ + +4- G Y G +LPSE EL+ QL VSRATLR++L LL+E+ +V G G 
Sbjct: 10 YLQVIDKLKHDMEAGVYEEGEKLPSEFELSKQLGVSRATLREALRLLEEEGVWRRHGVG 69 

Query: 81 NFIRENSSNLSENGYENRQHPIKTCLTSKITEVELE-- 

F+ ++ L G E +T I ++E 

Sbjct: 70 TFV- -HTKPLFSAGIEELY S VTDMIRHADMEPGT I FLSS YQI EATDDDKRRFQTD 122 

Query: 133 ETP VWI ADRWYHTDDGPLAYTLS F I P IELISDAE I SLHDTKQLLNFI EEG 183 

+++ +R D P+ Y L +P ELI + S+H+ +L+ +E G 
Sbjct: 123 NLDQLMMIERVRTADGVPIVYCLDKLPAELI- -GQHSVHEINSILDHLESG 171 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6237> which encodes the amino acid 
sequence <SEQ ID 623 8>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Filial Results 

bacterial cytoplasm Certainty=0. 2297 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 154/244 (63%) , Positives = 189/244 (77%) 

Query: 7 MPKNELNNraKLKHVQVYNKIFNMIQDGTYSPGMQLP^ 66 

M N+L KL KLKHVQVYN IF + 1 QDGT YS PGMQLPSEPELA QLNVSR TLRKSLAL 
Sbjct: 1 MSTNDLTKKLKKLKHVQvYNTIFQLIQDGTYSPGMQLPSEPEL^^ 60 

Query: 67 LQEDHLVKNIRGKGNFIRENSSNLSE.MGYENRQHPIKTCLTSKITEVELEFRVEVPAEAI 126 

LQEDHL+KNIRGKGNFI + G+E QHPI L+S IT+VELE+R+EVP AI 

Sbjct: 61 LQEDHLIKNIRGKGNFILKTPETKYHQGFEYLQHPIYASLSSDITICVELEYRIEVPTVAI 120 

Query: 127 TASLKQETPWVIADRWYHTDDGPIAYTLSFIPIELISDAEISLHDTKQLLNFIEEGIYQ 186 

TASLKQETPW+I DRWYH+ + +AY+LSFIPIE+IS I+L+ + hh F+EE IY+ 
Sbjct: 121 TASLKQETPWIIVDRWYHSQNKAIAYSLSFIPIEVISKYAINLNQEEPLLTFLEEKIYE 180 

Query: 187 EGISSHSQSHLGYATSGNFSATKYTLSDHGQFILIQETIFKQEKILMCNKHYVPIEHFEL 246 

G +SHS + +GY +GN++ATKYTLS++ FILIQET++ + IL+ KHYVP + F+L 
Sbjct: 181 SGKASHSCNQIGYTKTGNYTATKYTLSEKSAFILIQETLYNGKDILVSTKHYVPADLFDL 240 

Query: 247 SITS 250 
+ S 

Sbjct: 241 KVQS 244 



60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2017 

A DNA sequence (GBSx2128) was identified in S.agalactiae <SEQ ID 6239> which encodes the amino 
acid sequence <SEQ ID 6240>. This protein is predicted to be purine nucleoside phosphorylase (udp-1). 
Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3910 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC65977 GB:AE001270 uridine phosphorylase (udp) [Treponema 
pallidum] 

Identities = 145/246 (58%) , Positives = 171/246 (68%) 

QYHLQIRPGDVGRYVIMPGDPKRCAKIAEHFDNAVLVADSREYVTYTGTLNGEKVS VTST 7 0 
+YH+ ++ D+G YVI+PGDP R KIA+HF + V +REYVTYTGTL VSV ST 





11 


Sbjct: 


10 




71 


Sbjct: 


70 


Query: 


131 


Sbjct: 


130 




191 


Sbjct: 


190 




251 


Sbjct: 


250 



!' ASEMESAALFV S VR G+ LV+GNQ R A G+++ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 624 1> which encodes the amino acid 
sequence <SEQ ID 6242>. Analysis of this protein sequence reveals the following: 
Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3910 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 259/259 (100%) , Positives = 259/259 (100%) 

Query: 1 MQNYSGEVGLQYHLQIRPGDVGRYVIMPGDPKRCAKIAEHFDNAVLVADSREYVTYTGTL 60 

MQNYSGEVGLQYHLQIRPGDVGRWIMPGDPKRCAKIAEHFDNAVLVADSREYVTYTGTL 
Sbjct: 1 MQNYSGEVGLQYHLQIRPGDVGRWIMPGDPKRCAKIAEHFDNAVLVADSREYVTYTGTL 60 

Query: 61 NGEKVSOTSTGIGGPSASIAMEELKLO^TFIRVGTCGGIDLDVKGGDIVIATGAIRME 120 

NGEKVSVTSTGIGGPSASIAMEELKLCGADTFIRVGTCGGIDLDVKGGDIVIATGAIRME 
Sbjct: 61 NGEKVSVTSTGIGGPSASIAMEELKL03ADTFIRVGTCGGIDLDVKGGDIVIATGAIRME 120 

Query: 121 GTSKEYAPIEFPAVADLEVTNALVNAAKKLGYTSHAGWQCI03AFYGQHEPERMPVSYEL 180 

GTSKEYAPIEFPAVADLEVTNALVHAAKKLGYTSHAGVVQCKDAFYGQHEPERMPVSYEL 
Sbjct: 121 GTSKEYAPIEFPAVADLEVTNALVNAAKKI^YTSHAG 180 
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Query: 181 LNKWEAWKRLGTKASEMESAALFVAASHLGVRCGSDFLVVGNQERNALGMDNPMAHDTEZV 240 

LNKWEAWKRLGTKRSEMESAALFVAASHLGVRCGSDFLWGNQERNALGMDNPMAHDTEA. 
Sbjct: 181 MKWEAWKRLGTKASEMESAMjFVAASHLGWCGSDFLWGNQERNALGMDNPMAHDTEA 240 

Query. 241 AIQVAVEALRTLIENDKSQ 259 

AIQVAVEALRTLIENDKSQ 
Sbjct: 241 AIQVAVEALRTLIENDKSQ 259 



1 0 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2018 

A DNA sequence (GBSx2129) was identified in S.agalactiae <SEQ ID 6243> which encodes the amino 
acid sequence <SEQ ID 6244>. This protein is predicted to be nucleoside transporter. Analysis of this 
15 protein sequence reveals the following: 

Possible site: 25 





have an uncleavable N- 


term signal seq 










INTEGRAL 


Likelihood = -9.45 


Transmembrane 


35 


51 


30 


57 


INTEGRAL 


Likelihood = -9.29 


Transmembrane 


8 


24 


1 


28 


INTEGRAL 


Likelihood = -8.07 


Transmembrane 


3S8 


404 


379 


4 04 


INTEGRAL 


Likelihood = -7.27 


Transmembrane 


104 


120 


100 


127 


INTEGRAL 


Likelihood = -6.58 


Transmembrane 


259 


275 


255 


284 


INTEGRAL 


Likelihood = -4.35 


Transmembrane 


172 


188 


171 


190 


INTEGRAL 


Likelihood = -3.50 


Transmembrane 


200 


216 


199 


221 


INTEGRAL 


Likelihood = -2.18 


Transmembrane 


352 


358 


352 


371 



- Final Results 

bacterial membrane Certainty=0. 4779 (Affirmative) • 

bacterial outside --- Certainty=0. 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < t 



A related GBS nucleic acid sequence <SEQ ID 10245> which encodes amino acid sequence <SEQ ID 
10246> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05165 GB:AP001512 nucleoside transporter [Bacillus halodurans] 
Identities = 160/405 (39%) , Positives = 256/405 (62%) , Gaps = 8/405 (1%) 

MQFIYSIIGILLVLGIVYAISFNRKSVSLSLIGKALIVQFIIALILVRIPLGQQWSWS 64 
M ++ ++GI++V I +A S NR+++ I L +Q + A+I+++IP GQ ++ ++ 
MNILWGLLGIWVFLIAFAFSTNRRAIKPRTILGGLAIQLLFAI IVLKIPAGQALLESLT 60 

TGVTKVINCGQAGmFVFGSLADSGAKTGFIFAIQTLGNIVFLSALVSLLYYVGILGFW 124 
V +1+ G++FVFG + G+ GF+FAI L ++F SAL+S+LYY+GI+ FV+ 





5 


Sbjct: 


1 


Query: 


65 


Sbjct: 


61 




125 


Sbjct: 


121 




185 


Sbjct: 


181 




244 


Sbjct: 


241 




297 



A AN+F+GQT++P++V 



j +MT SE+ V+ G+ S++ 



IDA A GASTG + +1 A L+AFV L++LIN +L +G 



L G+ +G KL++NEFV++ 
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Sbjct: 301 IAWIGIPWAEALQAGSYIGQKI,\^rMEF^AYLSFAPEIENLSDKAVMVISFALCGFiyNIFS 360 

Query: 357 SLGICVSGIAVLCPEKRGTLARLVFRAMIGGIAVSMLSAFIVGIV 401 
SLGI + G4 L P +R +ARL RA++ G S+LSA I G++ 
5 Sbjct: 351 SLGILLGGLGKLAPSRRPDIARLGLRAILAGTLASLLSASIAGML 405 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6245> which encodes the amino acid 
sequence <SEQ ID 6246>. Analysis of this protein sequence reveals the following: 



.> Seems to 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



: 25 

i uncleavable N-term signal seq 



Likelihood = - 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 



- Final Results - 



bacterial outside 
bacterial cytoplasm 



Transmembrane 
Transmembrane 
Transmembrane 



- 24 ( 1-28 

• 404 ( 379 - 404. 

- 120 ( 100 - 127 

• 275 ( 255 - 284 



172 - 188 I 



Transmembrane 352 - 



-- Certainty=0. 4779 (Affirmative) ■ 
-- Certainty=0 . 0000 (Not Clear) < i 
-- Certainty=0 . 0000 (Not Clear) < i 



25 The protein has homology with the following sequences in the databases: 

>GP:BAB05165 GB:AP001512 nucleoside transporter [Bacillus halodurans] 
Identities = 160/405 (39%) , Positives = 257/405 (62%) , Gaps = 8/405 (1%) 

Query: 5 MQFIYS I IGILLVLGI VYAISFNRKSVSLSLIGKALIVQFI IALILVRI PLGQQI VSWS 64 
30 M ++ 4+GI++V I +A S NR+++ I L +Q + A+I+++IP GQ ++ ++ 

Sbjct: 1 MNILWGLLGIWVFLIAFAFSTNRRAIKPRTILGGLAIQLLFAIIVLKIPAGQALLESLT 60 

Query: 65 TGVTSVINCGQAGLNFVFGSLADSGAKTGFIFAIQTLGNIVFLSALVSLLYYVGILGFW 124 
V ++I+ G++FVFG + G+ GF+FAI L ++F SAL+S+LYY+GI+ FV+ 
35 Sbjct: 61 NVVI^IISYANEGIDFVFGGFFEEGSGVGFVFAINVLSVVIFFSALISILYYLGIMQFVI 120 



Query: 125 KWIGKGVGKIMKSSEVESFVAVANMFLGQTDSPILVSKYLGRMTDSEIMVVLVSGMGSMS 184 

K IG + ++ +S+ ES A AN+F+GQT++P++V YL +MT SE+ V+ G+ S++ 
Sbjct: 121 KIIGGALSWLLGTSKAESMSAAANIFVGQTEAPLWKPYLPKMTQSELFAVMTGGLASVA 180 

40 

Query: 185 VSILGGYIALGIPMEYLLIASTMVPIGSILIAKILLPQTEPVQKI-DDIKMDNKGNNANV 243 

S+L GY LG+P++YLL AS M +++AK+++P+TE DD K+ + N+ 

Sbjct: 181 GSVLIGYSLLGVPLQYLIAASFMAAPAGLIMAKMIMPETEKTTDAEDDFKLAKDEESTNL 240 

45 Query: 244 IDAIAEGASTGAQMAFSIGASLIAFVGLVSLINMMLSGLG IRLEQIFSYVFAP 295 

IDA A GASTG + +1 A L+AFV L++LIN +L +G + LE I YVFAP 

Sbjct: 241 IDAAANGASTGLMLVLNIAAMLLAFVALIALINGILGWIGGLFGASQLSI.ELILGYVFAP 300 



Query: 297 FGFLMGFDHKNILLEGNLLGSKLILNEFVSFQQLGHLIKSLDYRTALVATISLCGFANLS 356 

F++G h G+ +G KL++NEFV++ I++L + +V + +LCGFAN S 

Sbjct: 301 LAFVIGIPWAEALQAGSYIGQKLWNEFVAYLSFAPEIENLSDKAVMVISFALCGFANFS 360 

Query: 357 SLGIOTSGIAVLCPEKESTLARLVFRAMIGGIAVSMLSAFIVGIV 401 

SLGI + G+ L P +R +ARL RA++ G S+LSA I G++ 
Sbjct: 361 SLGILLGGLGKLAPSRRPDIARLGLRAILAGTLASLLSASIAGNL 405 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 399/404 (98%) , Positives = 401/404 (98%) 

60 Query: 1 MEVIMQFIYSIIGILLVLGIVYAISFNRKSVSLSLIGKALIVQFIIALILVRIPLGQQW 60 

+EVIMQFIYSI IGILLVLGI VYAISFNRKSVSLSLIGKALIVQFI IALILVRI PLGQQ+V 
Sbjct: 1 LEVIMQFIYSIIGILLVLGIVYAISFNRKSVSLSLIGKALIVQFIIALILVRIPLGQQIV 60 

Query: 61 SWSTGVTKVINCGQAGLNFVFGSLADSGAKTGFIFAIQTLGNIVFLSALVSLLYYVGIL 120 
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Query: 121 GFWKWIGKGVGKIMKSSEVESFVAVANMFLGQTDSPILVSKYLGRMTDSEIMVVLVSGM 180 

GFWKWIGKGVGKIMKSSEVESPVA^H^FI^QTDSPILVSICTLGRMTDSEI^WVLVSGM 
Sbjct: 121 G 



Query: 181 GSMSVSILGGYIALGIPMEYLLIASTMVPIGSILIAKILLPQTEWQKIDDIKMDNKGNN 240 

GSMSVSILGGYIALGIPMEYLLIASTMVPIGEILIAKILLPQTEPVQKIDDIKMDNKGNN 
Sbjct: 181 GSMSVSILGGYIALGIPMEYLLIASTMVPIGSILIAK1LLPQTEPVQKIDDIKMDNKGNN 240 

Query: 241 ANVIDAIAEGASTGAQMAFSIGASLIAFVGLVSLINMMLSGLGIRLEQIFSYVFAPFGFL 300 

ANVIDAIAEGASTGAQMAFSIGASLIAFVGLVSLINMMLSGLGIRLEQIFSYVFAPFGFL 
Sbjct: 241 ANVIDAIAEGASTGAQMAFSIGASLIAWGLVSLINMMLSGLGIRLEQIFSVVFAPFGFL 3 00 

Query: 301 MGFDHKNILLEGNLLGSKLlIiNEFVSFQQLGDLIXSLDYRTALVATISLCGFAKLSSLGI 360 

MGFDHKNILLEGNLLGSKLILNEFVSFQQLG LIKSLDYRTALVATISLCGFANLSSLGI 
Sbjct: 301 MGFDHKNILLEGNLLGSKLinNEFVSFQQLGHLIXSLDYRTALVATISLCGFANLSSLGI 360 

Query: 361 CTSGIAVLCPEKRGTLARLVFRAMIGGIAVSKLSAFIVGIVTLF 404 

CVSGIAVLCPEKR TLARLVFRAMIGGIAVSKLSAFIVGIVTLF 
Sbjct: 361 CVSGIAVLCPEKRSTLARLVFRAMIGGIAVSKLSAFIVGIVTLF 404 



A related GBS gene <SEQ ID 8955> and protein <SEQ ID 8956> were also identified. Analysis of this 
25 protein sequence reveals the following: 



Lipop: Possible site 
McG: Discrim Score: 
GvH: Signal Score 

»> Seems to have 
ALOM program 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 

INTEGRAL 



Crend: 

13.83 
7.5) : -2.63 



ive an uncleavable N-term signal seq 
count: 8 value: -9.45 threshold: 0. 
Likelihood = -9. 
Likelihood = -9 
Likelihood = -8. 
Likelihood = -7. 
Likelihood = -6 
Likelihood = -4 
Likelihood = -3. 
Likelihood = -2. 



- 51 ( 30 - 

- 24 ( 1 - 

- 404 ( 379 - 

- 120 ( 100 - 

- 275 ( 255 - 

- 188 ( 171 - 

- 216 ( 199 - 

- 368 ( 352 - 



v Reasoning Step: 3 

— Final Results — 

bacterial v 
bacterial outside • 
bacterial cytoplasm • 



-- Certainty=0. 4779 (Affirmative) ■ 
-- Certainty=0. 0000 (Not Clear) < i 
-- Certainty= 0.0000 (Not Clear) < i 



50 Hie protein has homology with the following sequences in the databases: 

- 418 of 418) NupC family protein {Vibrio cholerae} 



55 



ORF01622(313 - 1512 of 1812) 
GP|9656920|gb|AAF95495.l| |AE004305(1 - 
%Match =24.0 

%Identity =39.5 %Similarity =65.7 

Matches = 160 Mismatches = 134 Conservative Sub.s 



276 



306 



336 



366 



396 



:*STPHTY*K**ITISEVLEVIMQFIYSIIGILLVLGIVYAISFNRKSVSLSLIGKALIVQFIIALILVRIPLGQQWSV 
I = hlh ::||| =1 ll! = = = l =1 h =11 = == =1 ||::s 
MSLFMSLIG^VLLGIAVLLSSNRKAINLRTVGGAFAIQFSLGAFILYVPWGQELLRG 



VSTGVTKVINCGQAGLNFVFGSLADSG- - • 
1 1= 111 I I I 



■-AKTGFIFAIQTLSNIVFLSALVSLLYWGILGFWKWIGKGVGKIMKS 
Mill : I =M = llh|:|MM = = =|:= =1 1= I = = 
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-2273- 

FSDAVSJWimGMDGTSFLFGGLVSGKMFEVFGGGGFIFAFRVLPTLIFFSALISVLYYLGVMQWVIRILGGGLQKALGT 
70 80 90 100 110 120 130 

741 771 801 831 861 891 921 951 

5 SEVESFVAVANMFLGQTDSPILVSKYLGRMTDSEIMWLVSGMGSMSVSILGGYIALGIPMEYLL1ASTMVPIGSILIAK 
I II I ||:|:|||::|::| == =11 11= 1= h l = = =1 II = = l = =llh III I =1 II 
SRAESMSAARNIFVGQTFAPLVVREFVPKMTQSELFAVMCGGIASIAGG^GYASMGVKIEYLVAASF^PGGLIjFAK 
150 160 170 180 190 200 210 

10 981 1011 1038 1068 1098 1128 1167 

1LLPQTEPVQKIDDIKMDNKGNN-ANVIDAIAEGASTGAQMAFSIGASLIAFVGLVSLINMMLSGLG IRLEQI 

= = = hll I =11 =1 = llllll I III I l = l = = = ll lll|:||::||| II 1 = 1 >>|| = 

L^PETEKPQDNEDITLDGGDDKPANVIDAAAGGASAGLQIAjNVGAMLIAFIGLIALINGMLGGIGGWFGMPEBKLEML 
230 240 250 260 270 280 290 

15 

1197 1227 1257 1287 1305 1332 1362 1392 

FSYVFAPFGFLMGFDHKNILLEGNLLGSKLILNEFVSFQQ LGDLIKS-LDYRTALVATISLCGFANLSSLGICVSG 

= ==111= 11=1 : I ::| I : ||||:: | | = I =1 = = =111111111= I = I 

LGWLFAPIAFLIGVPWNFATOAGEFIGLKTVANEFVAYSQFAPYLTFjyiPVVLSEKTKAIlSFALCGFANLSSIAILLGG 
20 310 320 330 340 350 360 370 

1422 1452 1482 1512 1542 1572 1602 1632 

IAVLCPEKRGTLARLVFRAMIGGIAVSMLSAFIVGIOTLF*KIjTKERRIVTl , IK*KIF*KR*TILC*QQQQHGQKSKQF*M 
: I |::|| :||: :|:| | ::::| | | | 
25 LGSLAPKRRGDIARMGVKAVIAGTLSNLMAATIAGFFLSF 
390 400 410 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

30 Example 2019 

A DNA sequence (GBSx2130) was identified in S.agalactiae <SEQ ID 6247> which encodes the amino 
acid sequence <SEQ ID 6248>. This protein is predicted to be deoxyribose-phosphate aldolase (deoC). 
Analysis of this protein sequence reveals the following: 

Possible site: 49 
35 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2196 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA81646 GB:Z27121 deoxyribose aldolase [Mycoplasma hominis] 
Identities = 99/199 (49%) , Positives = 140/199 (69%) , Gaps = 1/199 (0%) 

45 







5 


DILKTVDHTLLATTATWPEIQTILDDAMAYETASACIPASYVKKAAEYVSGK-LAICTVI 63 








++ K +DHT L+ +AT +1 ++ +A+ Y+ S CI SYVK A E + + +CTVI 




Sbjct: 


3 


ELNKYIDHTNLSPSATSKDIDKLIQEAIKYDFKSVCIAPSYVKYAKEALKNSDVLVCTVI 62 


50 




64 


GFPNGYSTTAAKVFECQDAIKNGADEIDMVINLTDVKNGDFDTVEEEIRQIKAACQDHIL 123 








GFP GY+ T+ KV+E + A+++GADEIDMVIN+ K+G ++ V EI+ IK AC L 




Sbj Ct: 


63 


GFPLGYNATSVKVYETKIAVEHGADEID]WINvGRFKDGQYEYvlilSn3IKAIKEACNGKTrj 122 


55 




124 


KVIVETCQLTKEELIELCGVVTRSGADFIKTSTGFSTAGATFEDVEVMAKYVGEGVKIKA 183 






KVIVET LTK ELI++ +V +SGADFIKTSTGFS GA+FED++ M + G+ + IKA 




Sbjct: 


123 


KVIvETALLTKAELIKITELVMQSGADFIKTSTGFSYRGASFEDIQTMKETCGDKLLIKA 182 






184 


AGGISSLEDAEKFIALGAS 202 








+GGI +L DA++ I LGA+ 


60 


Sbj ct : 


183 


SGGIKNLADAQEMIRLGAN 201 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 6249> which encodes the amino acid 
sequence <SEQ ID 6250>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2196 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 

An alignment of the GAS and GBS proteins is shown below. 
Identities = 211/223 (94%) , Positives = 217/223 (96%) 



25 



Query: 


1 


MEVKDILICITOHTLLATTATWPEIQ^ILDDA^.YETASACIPASWKIffiAEWSGIOAIC 


60 






+evkdilktvdhtllattatwpeiqt:lddamayetasacipasyvkkaaeyvsgklaic 




Sbjct: 


1 


VEVTaOILKTvnHTLIATTATWPEIQTILDDAMAYETASACIPASYVKKAAEYVSGKLAIC 


60 




61 


TVIGFPNGYSTTAA.KVFECQDA1KNGADEIDMVINLTDVKNGDFDTVEEEIRQIKAACQD 


120 






TVIGFPNGYSTTAAKVFECQDAI+NGADEIDMVINLTDVKNGDFDTVEEEIRQIKA CQD 




Sb j at : 


61 


TOIGFPNGYSTTAAKVFECQDAIQNGADEIDMVINLTDVKNGDFDTVEEEIRQIKAKCQD 


120 


Query: 


121 


HILKVIVETCQLTKEELIELCGVVTRSGADFIKTSTGFSTAGATFEDVEVMAJCYVGEGVK 


180 






HILKVIVETCQLTKEELIELCGvOTRSGADFIKTSTGFSTAGATFEDVEVMAKYVGEGVK 




Sbjct: 


121 


HILKVIWCQLTKEELIELCGVVTRSGADFIKTSTGFSTAGATFEDVT3VMAKYVGEGVK 


180 




181 


IKAAGGISSLEDAEKFIALGASRLGTSRIIKIVKNQKVEEGTY 223 








IKAAGGISSLEDAt- FIALGASRLGTSRI IKIVKN+ + +Y 




Sb j ct : 


181 


IKAAGGISSLEDAKTFIALGASRIiGTSRlIKIVKNEATKTDSY 223 





30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 2020 

A DNA sequence (GBSx2131) was identified in S.agalactiae <SEQ ID 625 1> which encodes the amino 
acid sequence <SEQ ID 6252>. This protein is predicted to be phosphopentomutase (deoB). Analysis of 
35 this protein sequence reveals the following: 
Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm --- Certainty=0. 0546 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45496 GB:U80410 phosphopentomutase [Lactococcus lactis subsp. 
cremoris] 

Identities = 275/408 (67%) , Positives = 325/408 (79%) , Gaps = 7/408 (1%) 

Query: 3 QFDRIHLWLDSVGIGAAPDANDFVNAGVP DGASDTLGHISKTVGLAVPNMAK1 56 

+F RIHLW+DSVGIGAAPDA+ FN V D SDT+GHIS+ GL VPN+ K+ 

Sbjct: 4 KFGRIHLVVMDSVGIGAAPDADKFFNHDVETHEAINDVKSDTIGHISEIRGLDVPNLQKL 63 

Query: 57 GLGNIPRPQALKTVPAEENPSGYATKLQEVSLGEDTMTGHWEIMGIjNITEPFDTFWNGFP 116 

G GNIPR LKT+PA + P+ Y TKL+E+S GKDTMTGHWEIMGLNI PF T+ G+P 
Sbjct: 64 GWGNIPRESPLKTIPAAQKPAAY\TKLESISKGKDTMTGHWEIMGLNIQTPFPTYPEGYP 123 



Query: 117 EDIITKIEDFSGRKVIREANKPYSG?AVIDDFC-PRQMETGELIIYTSADPVLQIARHEDI 176 
ED++ KIE4FSGRK+IREANKPYSGTAVI+DFGPRQ+ETGELIIYTSADPVLQIAAHED+ 
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Sbjct: 


124 




Query: 


177 


5 


Sbjct: 


184 




Query: 


236 


10 


Sbjct: 


244 




Query: 


296 




Sbjct: 


304 


15 




356 




Sbjct: 


364 




There is ai 


Isohc 


20 


Identities 








25 


Sbjct: 


l 

61 




Sbjct: 


61 


30 


Query: 


121 




Sbjct: 


121 


35 




181 




Sbjct: 


181 






241 


40 


Sbjct: 


241 






301 


45 


Sbjct: 


301 






361 




Sbjct: 


361 


50 


Based on 


this 



I EELY+ICEY RSIT+E ++ GRIIARPYVC-E GNF RT R DYA+SPF +TVL 



KL +AGIDTY+VGKI+DIFN G+ +DMGHN ++ G+D L+K M +EF +GFSFTHBV 



DFDA YGHRRD GY 



++PVGHFADISAT+A+NF V A GESFL LV 



MSQFDRIHLWLDSVGIGAAPDANDFVNAGVPDGASDTLGHISKTVGLAVPNMAKIGLGM 6 0 
MS+F+RIHLWLDSVGIGAAPDA+ F NAGV D SDTLGHIS+ GL+VPNMAKIGLGN 
MSKFNRIHLWLDSVGIGAAPDADKFFNAGVADTDSDTLGHI SEAAGLSVPNMAKIGLGN 6 0 



TKIE+FSGRK+IRFANKPYSGTAVIDDFGPRQMETGELI+YTSADPVLQIAAHEDIIP+E 



ELY+ ICEYARS IT+ERPALLGRI IARPYVG+PGKFTRTANRHDYAVS PF+DTVLNKL 



G+ TYAVGKINDI FNGSGI +DMGHNKSNSHGIDTLIKT+ L EF KGFSFTNLVDFDA 



YS SFTGNGLIP GHFADI SATVA+NFGVDTAK IGE S FL h 



vaccines or diagnostics. 
Example 2021 

A DNA sequence (GBSx2132) was identified in S.agalactiae <SEQ ID 6253> which encodes the amino 
acid sequence <SEQ ID 6254>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.05 Transmembrane 9 - 25 ( 4 - 35) 

Final Results 

bacterial membrane Certainty=0 . 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6255> which encodes the amino acid 
sequence <SEQ ID 6256>. Analysis of this protein sequence reveals the following: 

5 Possible site: 56 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -5.57 Transmembrane 41 - 57 ( 38 - SO) 

Final Results 

10 bacterial membrane Certainty=0. 3230 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9143> which encodes the amino acid sequence 
15 <SEQ ID 9144>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 49 
>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.57 Transmembrane 13 - 29 ( 10 - 32) 

20 Final Results 

bacterial membrane Certainty= 0.323 (Affirmative) < suco 

bacterial outside Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 276/544 (50%), Positives = 368/544 (66%), Gaps = 5/544 (0%) 

Query: 5 FKKKVVKVCLVIFGIVLVSLLSLGFFYFSKGQVLSRFVAARSRTSGQAFDNIKEYMVWSD 64 
F K +K +1 L L G FY+SK *+ ++ ARS SG F+NIK Y+W D 
30 Sbjct: 33 FHHKKLKQITIIAATSLFLFLIGGAFYYSKNHCINAYLKARSAQSGPVFENIKAYLVIMDD 92 

Query: 65 TGESITNDFANYANFEPLSKSEARKLGQEIKEGNKNDSMYLKRVGSRLGIFPDYRIANKP 124 

T E ITNDEA Y F S+ E R+ Q++K +++ ++ +K VG R IFPDYRIA KP 
Sbjct: 93 TNEQITNDEAMYTKFRRYSQKELRQKKQDLKAASQDSAVQVKSVGRRFWIFPDYRIAIKP 152 

35 

Query: 125 MSLTLKTNVPKLDVLLNQKKVATSNSDHFSVTVERLPRTHYTASLEGTSDGKEIKLKKDY 184 

M LT+KTNVP+ DVLLNQKKVA S+S+ FSV +4-RLP YTAS+ G +G+ IK+ K Y 
Sbjct: 153 ITOLTIKTNVPQADVLLNQKIWAVSDSEQFSVKLDRLPTAEYTASIRGKHNGRNIKVNKSY 212 

40 Query: 185 DGKNQTIDLSVAFKSFTvTSNLMDGNLYFGDNRIAKLKDGSHSVENYPVTDGSKAYIKKV 244 

DG N +DLSV+F++F VTSN G+LYF DN I LKDG VE+YPVT+ ++AY+K 
Sbjct: 213 DGDNPVLDLSVSFRTFLVTSNAKQGDLYFDDNHIGTLKEGQLQVEDYPVTENAQAYMKTT 272 

Query: 245 FNDGEITSHKQKLISIADNQTIKLDVTJGLLNEKEAGQKLITAFNQLILYVSTGQDPQTLG 304 
45 F.DGE+ S K L + + T+++ V LL E +AG+ L++AF+QL+ Y+STGQD h 

Sbjct: 273 FPDGELRSQKYALADVEEGATLEILVTDLLEEDKAGELLVSAFDQLMHYLSTGQDSSNLR 332 

Query: 305 TVFEKGAENDFYKGLKES I ICAKFVTDNRKASHFTIPNI VLNKMTQVGKESYQVNFAADYD 364 
+VFE G+ N FY+GLKESIKAKF TD RKAS IP+I+L MTQVGK +Y ++F A Y+ 
50 Sbjct: 333 SVFEAGSSNAFYRGLKESIKAKFQTDTRKASRLNIPSILLTTMTQVGKTTYVLDFTATYE 392 

Query: 365 FNYDKSTDPDKKTYGHIIQNLTGNFIMKKSGNSYLISNDGKKDITVAKETNKVKADPVSI 424 

F YDKSTDP++ T GHI Q+LTG +KK G YLIS G K+ITV KE N++KA S+ 
Sbjct: 393 FLYDKSTDPEQHTSGHINQDLTGKVTVKKVGQHYLISQSGSKNITVVKEDNQLKAP--SV 450 

55 

Query: 425 FPENLVGSWKGEVEDGTVlMTFDKDGKVTQK-KVYKDSKSKESmSAKVTKLEDKGNGLY 483 

FPE+++G+W G+ ++ M+ DG +T K + K ++SKE+ +AK++K+EDKGNG Y 
Sbjct: 451 FPESILGTWTGQANGLSIHMSLASDGTITTKVEDQKGNRSKET-RTAKISKVEDKGNGFY 509 

60 Query: 484 LYQYESGTDTTTFV-TGGIGGLKVKYAYGIKIEGNKIIPVIWQTSSDGEFDYHKPLLSKP 542 

LY + G+D + V GG+GG VKYAYG KI G PV+WQ + EFDY KPL 
Sbjct: 510 LYTPDPGSDISALVPEGGLGGANVKYAYG?KISGKTASP\fVWQAALTHEFDYTKPLSGvT 569 
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Query: 543 LTKQ 546 
L KQ 

Sbjct: 570 LQKQ 573 

A related DNA sequence was identified in S.pyogenes <SEQ ID 9065> which encodes amino acid sequence 
<SEQ ID 9066>. An alignment of the GAS and GBS sequences follows: 

Score = 47.3 bits (110), Expect = 4e-07 

! = 65/303 (21%) , Positives = 119/303 (38%) , Gaps = 16/303 (5%) 

FYI LGIGTS I S I WALTRFVKE I SLNFKEIKKLMJKMGIEVLSENENYSQI I EFDDI 209 

+YIL + T 1+ +V + +S F +KKL KM + +QI EF D+ 



IA LSHDIKT? 



Identities 


Query: 


153 


Sbj ct : 


37 


Query: 


210 


Sbjct: 


96 


Query: 


270 


Sbjct: 


156 


Query: 


322 


Sbjct: 


213 




382 


Sbjct: 


272 




440 


Sbjct: 


332 



- L R L NL+ NA -1 



f- SR K -rG+GL 



A related sequence was also identified in GAS <SEQ ID 9135> which encodes the amino acid sequence 
<SEQ ID 9136>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -3.56 Transmembrane 145 - 161 ( 145 - 164) 



Final Results 

bacterial membrane Certainty=D. 2423 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 6254 (GBS280) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 8; MW 63.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 58 (lane 7; MW 88.7kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2022 

A DNA sequence (GBSx2133) was identified in S.agalactiae <SEQ ID 6257> which encodes the amino 
acid sequence <SEQ ID 6258>. This protein is predicted to be ribosomal large subunit pseudouridine 
synthase D (rluC). Analysis of this protein sequence reveals the following: 

Possible site: 22 
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»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.62 Transmembrane 2 - 18 ( 1 - 19) 

Final Results 

5 bacterial membrane Certainty=0 .2848 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:CAB12749 GB:Z99108 similar to hypothetical proteins [Bacillus subtilis] 

Identities = 97/251 (38%) , Positives = 147/251 (57%) , Gaps = 15/251 (5%) 





86 


KHVLINNEFINWQTWQENETITLIFDDEDYPTKKIPLGRAELIDCLYEDEHLIIVNKPE 145 






+ + +N+E + +V++ D++ +++ G +D L+ED H++I+NKP 


Sbjct: 


43 


QQIKVNHESVLMNMIVKKGDRVFIDLQES3ASSVIPEYGE- - - LDILFEDNHMLI INKPA 99 




146 


GMKTHGNQPNE I ALLNHVSAY SGQTCYV— VHRLDMETSGAVLFAKNPFILPLINQ 199 






G+ TH N+ + L ++ AY +G+TC V VHRLD +TSGA++FAK+ +++Q 


Sbj ct : 


100 


GIATHPNEDGQTGTLANLIAYHYQINGETCra/RHVHRLDQDTSGAIVFAKHRIAHAILDQ 159 




200 


RLERKEIWREYWALVEGKFSPKHQVLRDKIGRNR-HDRRKRI IDSKNGQHAMTIIDVL- - 256 






+LE+K + R Y A+ EGK K + IGR+R H R+R+ S GQ A+T V+ 


Sbjct: 


160 


QLEKKTLKRTYTAIAEGKLRTKKGTINPPIGRDRSHPTRRRV- - SPGGQTAVTHFKVMAS 217 


Query: 


257 


+ SL4+ LETGRTHQIRVHL+ GHPL GD LY S R LHA+++ HP 


Sbjct: 


218 


NAXERLSLVELELETGRTHQIRVHIASLGHPLTGD^^ 277 


Query: 


316 


LTCETISVEAP 326 






+T E I EAP 


Sbjct: 


278 


ITDELIVAEAP 288 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6259> which encodes the amino acid 
sequence <SEQ ID 6260>. Analysis of this protein sequence reveals the following: 

35 Possible site: 38 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4198 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 172/278 (61%) , Positives = 212/278 (75%) , Gaps = 2/278 (0%) 

Query: 63 TVKELLEDYFLIPRKIPJ1FLRVKKHVLINNEFINWQTVVQENDTITLIFDDEDYPTKKIP 122 

TVK LLE+ LIPRKIRHFLR KKHVLIN +NWQ+ V+ D + L FD EDYP K I 
Sbjct: 2 TVTCALLEEQLLIPRKIRHFLRTKKHVLINGHSVMfQSCVKYGDQVKLFFDHEDYPEKIIV 61 

Query: 123 LGRAELIDCLYEDEHLIIVNKPEGMKTHGNQPNEIALLNHVSAYSGQTCYWHRLDMETS 182 

+G+AE + CLYEDEH+ 1 1 VNKPEGMKTHGN P E+ALLNHVSAY+GQTCYWHRLD ETS 
Sbjct: 62 MGQAEKVTCLYEDEHIIIWKPEGMKTHGNDPTELALLNHVSAYTGQTCYVVHRLDKETS 121 



Query: 183 GAVLFAKNPFILPIiINQRLERKEIlVREYWALVEGKFSPKHQVLRDKIGRNRHDRRKRIID 242 

GA+DFAK PFILP++N+ LE+++I REY ALV G IGR+RHDRRKR++D 
Sbjct: 122 GAIIiFAKTPFILPILNRLLEKPJ)IHREYIAL'VKGSLDSPRVIYHHPIGRHRHDRRKRVVD 181 

Query: 243 SKNGQHAMTIIDVLK-YIQNSSLIKCRIiETGRTHQIRVHLSHHGHPLIGDPLY-NPSSMN 300 

NG+ A+T + ++K + + +SL+ C+L+TGRTHQIRVHL+H GH L GDPLY N + 
Sbjct: 182 PINGKKAITEVTLVT<NFHKTASLLTCQLQTCRTHQIRVHLAHQGHVLFGDPLYSNGKKDC 241 

Query: 301 ERLMLHAHRLTLSHPLTCETISVEAPSSTFEKILMNYK 338 

RLMLHA++L L HPLT E I V+A S+TF+ +LN K 
Sbjct: 242 ARLMLHAYQLRLKHPLTQEDICVQAKSATFDAVLNAQK 279 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2023 

A DNA sequence (GBSx2134) was identified in S.agalactiae <SEQ ID 626 1> which encodes the amino 
acid sequence <SEQ ID 6262>. Analysis of this protein sequence reveals the following: 

Possible' site: 52 

>>> Seems to have no N-terminal signal sequence 

■ 114 ( 93 - 119) 



Final Results 

bacterial membrane Certainty=0 .4509 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AA.F04735 GB:AF101780 penicillin-binding protein 2a 
[Streptococcus pneumoniae] 
Identities = 424/773 (54%), Positives = 555/773 (70%), Gaps = 47/773 (6%) 

Query: 2 KLFDKFIDLFRVDEDNDRMTRKNEQETREETSNLDGEEVYDIDDITRPSKSQYQRGIRHQ 61 

KLF+KF+ LF+ +ETS L+ + I R S+S 
Sbjct: 5 KLFEKFLSLFK KETSELEDSD STIIiRRSRS 34 

Query: 62 KENAKSRPEWLQKVDRYLPSPKNPIRRFWRRYRIGKLLFIJUjMAFILIFGSYIiFYLSKTA 121 

DR + PIR+FWRRY + K++ I ++ L+ G YLF ++K+ 
Sbjct: 35 DRKKLAQVGPIRKFWRRYHLTKIILILGLSAGIjIjVGIYLFAVAKST 80 

Query: 122 WSDLQSALKTTTTIYDKNKEYAGKLSGQKGTYVEimiSDHLKNAVIATEDRTFYENNG 181 

V+DLQ+ALKT T I+D+ ++ AG LSGQKGTYVEL IS +L+NAVIATEDR+FY+N+G 
Sbjct: 81 NVM)LQNALKTRTLIFDREEKEAGALSGQKGTYVELTDISKNLQNAVIATEDRSFYKNDG 140 

Query: 182 WFKRFFLAVATLGKFGGGSTITQQLAKNAYLSQDQTIICRKAREFFLALELTKKYSICAEI 241 

4-N+ RFFLA+ T G+ GGGSTITQQLAKNAYI.SQDQT++RKA+EFFLALEL+KKYSK +1 
Sbjct: 141 INYGRFFLAIOTAGRSGGGSTITQQLAKNAYLSQDQTVERKAKEFFLALELSKKYSKEQI 200 

LTMYLNN+YFGNGVWGVEDAS+KYFG SA+ +++D+AATLAGMLKGPE+YNP SVE++T 
Sbjct: 201 LTMYLNHAYFGNGVWGVEDASKKYFGVSASEVSLDQAATLAGMLKGPELYNPLNSVEDST 260 

Query: 302 NRRDTVLAAMVDAGKLTKSQAKEAASIGMKI^^ 361 

NRRDTVL MV AG + K+Q EAA + M ++li D Y GKI+DYRYPSYFDAWNEA+ 
Sbjct: 261 NRRDTVLQMWAAGYIDKNQETEAAEVEKTSQLHDKYEGKISDYRYPSYFDAVVNEAVSK 320 

Query: 362 YGISEKDIVNNGYKIYTALDQNYQSGMQKTFDDTSLFPVSDYDGQSAQGASVALDPKTGG 421 

Y ++E++IVNNGY+IYT LDQNYQ+ MQ +++TSLFP ++ DG AQ SVAL+PKTGG 
Sbjct: 321 YNLTEEEIVffflGYRIYTELDQlTYQANMQIVYEHTSLFPRAE-DGTFAQSGSVALEPKTGG 379 

Query: 422 VRGLVGRVQSTKDAQFRSFNYATQSKRSPASTIKPLWYSPAIASGWSIDKELPNKVQDF 481 

VRG+VG+V FR+FNYATQSKRSP STIKPLWY+PA+ +GW+++K+L N + 

Sbjct: 380 VRGWGQVADOTJKTGFRNFOTATQSKESPGSTIKPLVVYTPAVEAGWALNKQLDNHTMQY 439 

Query: 482 HGYKPSNYGGIET-ESIPMYQALANSYKIPAvYTLDKLGINKAFTYGRKFGLNMSSANKE 540 

YK NY GI+T +PMYQ+LA S N+PAV T++ LG++KAF G KFGUSIM ++ 
Sbjct: 440 DSYKVDIWAGIKTSREVP^QSLAESLNLPAVATVOT)LGVD 499 

Query: 541 LGVALG<3SVTTNPLEMAQAYSTFAroX5IMHR6HLITRIETANGKLVKQFTDKPKRVISRS 600 

LGVALG V TNPL+MAQAY+ FAN+G+M AH I+RIE A+G+++ + KRVI +S 
Sbjct: 500 LGVALGSGVETNPLQMAQAYAAFANEGLMPE^FISRIE3S!ASGQVIASHKNSQKRVIDKS 559 
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Sbjct: 560 VM3mTSMMLGTFTNGTC3ISSSPADYVimGKTGTTEA.VENPEYTSDQWVIC3YTPDWISH 619 

Query: 661 WVGFKNTDKHHYLTDSSAGTASNIFSTQASYILPYTKGSSFTHIENAYFQNGIGSVYNAQ 720 

W+GF TD++HYL S++ A+++F A+ ILPYT GS+FT +ENAY QNGI + 
Sbjct: 620 WLGFPTTDENHYLAGETSNGAMIVFRNIANTILPYTPGSTFT-VENAYKQNGIAPANTKR 678 

Query: 721 DASNTTNQESRSIINDLKDSASKAAQDISRAVEDSNFQEKVKDAWNSLKDYFR 773 

N ++ ++D++ A + SRA+ D+ +EK + W+S+ + FR 

Sbjct: 679 QVQTNDNSQTDDNLSDIRGRAQSLVDEASRAISDAKIKEKAQTIWDSIVNLFR 731 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6263> which encodes the amino acid 
sequence <SEQ ID 6264>. Analysis of this protein sequence reveals the following: 

Possible site: 52 



Final Results 

bacterial membrane --- Certainty=0. 4185 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF04735 GB:AF101780 penicillin-binding protein 2a [Streptococcus pneumoniae] 
Identities = 414/730 (56%), Positives = 539/730 (73%), Gaps = 17/730 (2%) 



Query: 


50 


Sbjct: 


18 


Query: 


110 


Sbjct: 


63 




170 


Sbjct: 


123 


Query: 


230 


Sbjct: 




Query: 


290 


Sbjct: 


243 






Sbjct: 


303 


Query: 


410 


Sbjct: 


362 




470 


Sbjct: 


422 




529 


Sbjct: 


482 




589 


Sbjct: 


542 



LL+G YLF ++K+ V+DLQ+ALK T+I+D + + AG+LSGQKG+YVEL IS + 



L+NAVIATEDR+FY N GIN RF LA+VTAGR GGGSTITQQLAKNAYLSQDQT++RKA 



MLKGPE+YNP +S+4++T+RRDTVL MV A I + + +A V + ++L D Y GK 



DYKYPSYFDAVISEAIATYGLSEKDIVNNGYKVYTELDQNYQTGMQTTFNNDELFPVSAY 409 
DY+YPSYFDAV++EA++ Y L+E++IVNNGY++YTELDQNYQ MQ + N LFP A 

irVNNGYRIYTELDQNYQANMQIVYENTSLFP-RAE 361 



DG+ AQ+ SVAL+PKTGGVRG++G+V ++ FR+FNYATQ+KRSP STIKPLWY PA 



V +GW++ K+L N +D Y+ NY G S +VPMYQ+LA S N+PAV+T+ND+G+DK 



KRV+ +SVADKMT+MMLGTF+NGT ++++ Y +AGKTGTTE FNP+ 
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Query: 649 LAGDQWVIGYTPDWISQWVGFNQTDENHYLTDSSAGTASAIFSTQASYILPYTKGSQFH 708 

DQWVIGYTPDWIS W+GF TDENHYL S++ A+ +F A+ ILPYT GS F 
Sbjct: 502 YTSDQWVIGYTPDWISHWLGFPTTDENHYLAGSTSNGAAHVFRNIANTILPYTPGSTFT 6S1 

Query: 709 VDNAYAQNGISAVYGVNETGNQSGVDTQSIIDGLEKSAQEASQSLSKAVDQSGLRDKAQS 758 
V+NAY QNGI+ + T + +R AQ S+A+ + +++KAQ+ 

• Sbjct: 662 VENAYKQNGIAPANTKRQVQTNDNSQTDDKLSDIRGRAQSLVDEASRAISDAKIKEKAQT 721 

Query: 769 IWKEIVDYFR 778 

IW IV+ FR 
Sbjct: 722 IWDSIVNLFR 731 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 530/715 (74%) , Positives = 623/715 (87%) , Gaps = 1/715 (0%) 

IAKSRPEWLQKVDRYLPSPKNPIRRFWRRYRIGKLLFIALMAFILIFGSYLFYLS 118 
h KSRP WLQK++ LPSP4- PIRRFWRRY IGKLL I + +L+ GSYLFYLS 



KTA VSDLQ ALK TT I YD EYAG LSGQKG+YVELNAISD L+NAVIATEDRTFY 



N+G+N KRF LAV T G+FGGGSTITQQLAKNAYLSQDQTIKRKAREFFLALELTKKYSK 



+ILTMYI^SYFGNGWGVEDAS+I<YFGT+AANLT+aEAATIAGMI ) KGPE+YNPY+S++ 



NAT+RRDTVL AMVDA K+T+++A++A ++G+KNRLADTY GK +DY+YPSYFDAV++EA 



Query: 


59 


Sbjct: 


65 


Query: 


119 


Sbjct: 


125 


Query: 


179 


Sbjct: 


185 


Query: 


239 


Sbjct: 


245 


Query: 


299 


Sbjct: 


305 


Query: 


359 


Sbjct: 


365 




419 


Sbjct: 






479 


Sbjct: 


485 




539 


Sbjct: 


545 


Query: 


599 


Sbjct: 






659 


Sbjct: 


665 


Query: 


719 


Sbjct: 


724 



-GRV S+4+ FRSFNYATQ+KRSPASTIKPLWY+PA+ASGWSI+KELPN V 



QDF GY+P NYG E+E 4- PMYQALANSYNI PAV TL+ +GI+KAFTYG+ FGL+MSSA 



KELGVALGGSVTTNPLEMAQAY+ FAN+G++H AHLI RIE A G+++K FTDK KRV+S 



SQWVGF TD++HYLTDSSAGTAS IFSTQASYILPYTKGS F H++KAY QM3I +VY 



+++SH+ L+ SA +A+Q +S+AV+ £ 



65 SEQ ID 6262 (GBS397d) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 153 (lane 13; MW 76kDa) and in Figure 184 (lane 9; MW 76kDa). 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2024 

A DNA sequence (GBSx2135) was identified in S.agalactiae <SEQ ID 6265> which encodes the amino 
acid sequence <SEQ ID 6266>. This protein is predicted to be M-like protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-10.56 Transmembrane 609 - 625 ( 599 - 628) 
INTEGRAL Likelihood = -0.00 Transmembrane 19 - 35 ( 19 - 35) 



Final Results 

bacterial membrane Certainty=0 . 5225 (Affirmative) < succ 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



Sbjct: 



3 KPE +PE KPE KPE KP P+ +P KPE KPE KPE K E KPE K E 



Query: 568 KP 569 
KP 

Sbjct: 296 KP 297 

There is also homology to SEQ ID 822. 

A related GBS gene <SEQ ID 8957> and protein <SEQ ID 8958> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -5.20 
GvH: Signal Score (-7.5): 3.07 

Possible site: 27 
»> Seems to have no N-terrainal signal sequence 
ALOM program count: 2 value: -10.56 threshold: 0.0 

INTEGRAL Likelihood =-10.56 Transmembrane 609 - 625 ( 599 - 628) 

INTEGRAL Likelihood = -0.00 

PERIPHERAL Likelihood = 8.54 
modified ALOM score: 2.61 



Final Results 

bacterial membrane 
55 bacterial outside 

bacterial cytoplasm 



— Certainty=0. 5225 (Affirmative) < suco 
•-- Certainty=0. 0000 (Not Clear) < suco 

— Certainty=0 . 0000 (Not Clear) < suco 



LPXTG motif: 596-600 
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The protein has homology with the following sequences in the databases: 

ORF00748(313 - 2190 of 2490) 

GP|2462785|gb|AAB71985.l| |U73163(3 - 374 of 374) M-like protein {Streptococcus equi} 
5 %Match =9.2 

%Identity = 36.0 %Similarity = 55.4 

Matches = 126 Mismatches = 147 Conservative Sub.s = 68 



LS* * IRIFN*LYKGANMNNiraKKVKYFLRKTAYC-^ 

:|::|::|||:|:||||:|||: | : | ||: 
MAKKEMKF YLRKSAFGLAS VS AALL VGAARVSADS 



15 696 726 756 786 813 843 870 900 

KVSDQEIfiKQSRRSQDIIKSLGFLSSDQKDILVKSISSSK-DSQLILKFVTQATQUlNAESTKAK-QMAQNDVALIKNIS 
::| | :: | | :::| :: |: | : ||: || |:| : 

- VESAGPVAVAVTDSLDSEAAATKAEADLVAAKADLAAAEVAITAAKAEFDTAQADLATAEATI 

40 50 60 70 80 90 

20 

921 951 981 1011 1041 1071 1101 1131 

PEV- --LEEYKEKIQPASTKSQVDEFVAEAKKVVNSNKETLVNQ 
I: = I ::||| I I : = I : I = : = = I = -III : : |:| : | 

AELEQKIPELEKKIQEAQEKLNYENRPS-PKRVGSDDEDDTVARKLMSEKEALKAE LQKTKEALDTAKRAYAGI 

25 110 120 130 140 150 160 170 



1161 1191 1221 1251 1281 1311 1638 1668 

KT.NITAAMNALWSIKQAAQEVAQKNLQKQYAKKIERISSKGLALSKKAKEIYEKHKSILPTP AKPDVKPEAKPDVK 

I l>> =1 I H= I I = II 



1698 1740 1770 1800 1830 

PKAKPDVKPEAKPDVKPD VKPDVKPEAKPEDKPD VXPDVKPEAKPDVKPEAKPE 

iim i 1 1 = i = 1 1 1 i i i i ■■ ih in ii= in in 

VKAELKAAGASDFYTKKIDSADTVDGVKTLREMILDSIAKPEVEPEAKPEPKLEPKPEPKPEPKPEPKPEPKPE 

220 230 240 250 260 270 280 

1860 1890 1920 1950 1980 2010 2040 2070 

AKPEAKPEAKPEAKPEAKPDVKPEAKPDVKPEAKPEAKPEAKSEAKPEAKLEAKPEAKPATKKSVNTSGNLAAKKAIENK 
III III III III Ih Ih :| I Mill III I 

PKPEPKPEPKPEPKPEPKPEPKPKPQPKPAPAPKPEAKKEEKKAAP K 

300 310 320 330 

2100 2130 2160 2190 2220 2250 2280 2310 

KYSKKLPSTGEAASPLLAIVSLIVMLSAGLITIVLKHKKN*IYF*T*TERSILSKS*GKPHQNFAFFI*ILE*FSRYFN* 
: = IN :\:: M II 11= = = Ml 

QDTNKLPSTGEATNPFFTAAAIiAVMAGAGVAAVSTRRKEN 
350 360 370 

SEQ ID 6266 (GBS3) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 3 (lane 5; MW 65kDa). The GBS3-His fusion product was purified (Figure 189, 
lane 8) and used to immunise mice. The resulting antiserum was used for FACS (Figure 261), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-2284- 

Example 2025 

A DNA sequence (GBSx2136) was identified in S.agalactiae <SEQ ID 6267> which encodes the amino 
acid sequence <SEQ ID 6268>. This protein is predicted to be transcription antitermination protein nusg 
(nusG). Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3203 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

3 carnosus] 

3/175 (67%), Gaps = 2/175 (1%) 

Query: 7 KGWFVLQTYSGYENKVKENLLQRAQTYNMLDNILRWIPTQTVNVEKNGKTKEIEENRFP 66 

K W+ + TYSGYENKVK+NL +R ++ NM + I RV IP + K+GK K++ + FP 

Sbjct: 8 KRVWA VHTYSGYENKVKKNLEKRVESMNMTEQI FRWI PEEEETQVKDGKAKKLTKKTFP 67 

Query: 67 GYVLvE^^vMTDEAWFVVPJS^TPNvTGEVGSHGNRSKPTPLLEEEIRSILISMGQTVDVFDT 126 

GYVLVE+VMTDE+W+WRNTP VTGFVGS G SKP PLL +E+R IL MG D 
Sbjct: 68 GYVLVELTOTDESWYVVRNTPGVTGFVGSAGAGSKSNPLLPDETOFILKQMGMKEKTIDV 127 



A related DNA sequence was identified in S.pyogenes <SEQ ID 6269> which encodes the amino acid 
sequence <SEQ ID 6270>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

?s> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3874 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 170/179 (94%) , Positives = 178/179 (98%) 

Query: 1 MLDSFDKGWFVLQTYSGYENKVKENLLQRAQTYNI4IJDNILRVBIPTQTVNVEKNGKTKEI 60 

MLDSFDKGWFVLQTYSGYENKVKENLLQRAQTYl^LDNILRvEIPTQTVNVEKNG+TKEI 
Sbjct: 6 MLDSFDKGWFVLQTYSGYENKVJCENLLQRAQTYNMLDNI LRVE I PTQTVNVEKNGQTKE I 65 

Query: 61 EECKFPGYVLVEMvMTDEAWFVVPUSTIPNWGFVGSHGNRSKPTPLLEEEIRSIIjISMGQT 120 

EENRFPGYVIjVEMvMTDEAWFVVRKITPNVTGFVGSHGNRSKPTPLLEEEIR+IL+SMGQT 
Sbjct: 66 EENRFPGYVIiVEMvMTDEAWFVvRNTPKTWGFVGSHGNRSKPTPLLEEEIRAILLSMGQT 125 

Query: 121 VDVFDTNIKEGDWQIIDGAFIGQEGRVVEIEN1>IIC/KLMINMFGSETQAEIjELYQVAEL 179 

+DVFDTNIKEGDWQIIDGAF+GQEGRWEIENNKVKLM+NMFGSET AE+ELYQ+AEL 
Sbjct: 126 IDVFDTNIKEGDWQIIDGAFMGQEGRV\ffiIIMNKV 184 



55 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 2026 

A DNA sequence (GBSx2137) was identified in S.agalactiae <SEQ ID 6271> which encodes the amino 
acid sequence <SEQ ID 6272>. This protein is predicted to be a glycosyl transferase. Analysis of this 
protein sequence reveals the following: 

5 Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.155B (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 
15 ducreyi] 

Identities = 98/259 (37%) , Positives = 155/259 (59%) , Gaps = 10/259 (3%) 



25 





5 


VAIAVDSNYLDKALVTIKSICVYNRNITFYL^ 


64 






+ LA + +Y + L TIKSI ++N++I FYIi N+D PEW +N KL L S++I++K+ 




Sbjct: 




IVLAANQSYSEYILTTIKSIYLHNKHIRFYLLNRDYPTEWFTJILNNKLRKLNSEIIDIKV 


69 




65 


YNYDIAHLTTFLTVS - - -TWFRLFIADYIPSSRVLYLDSDIIVNTNLDYLFELDFKGYYL 


121 






N I + T+ +S T+FR F++D+I +V+YLD+DI+VN +L L++ D Y+L 




Sbjct: 




TNDTIKNFKTYSHISSDTTFFRYFISDFIEQDKVIYLDADIWNGSLTELYQTDISNYFL 


129 




122 


AAVKDPHKNE EGGFNAGMLLANLELVTOEIXSLTKTLLKTAEELKRVVKrGDQSIIjNI 


177 






AAVKD + FNAGMLL N + WRE +T+ L +E+ + DQSILN+ 




Sbjct: 


130 


AAVKDIISEKIYVNNHIFWAGMLLimKKWREHNITQFCLSLSEKYINSLPDADQSILNL 


189 




178 


VCHNRWLSLNKTWNF- -QTYDWSRYNHRSYLYLNIENRTPNI IHFLTSDKPWNENSVAR 


235 






+ ++WL LN+ +N4 T + +Y YL ++ P IIH+ T KPW R 




Sbjct: 


190 


I FKDKWLKLNRGYNYLIGTDYLFFKYGKTRYLE-DLGETIPLI IHYNTEAKPWLNI FNTR 






236 


FRELWWYYFQLDFCQLTGK 254 








FR ++W+Y++L++ + K 




Sbjct: 


249 


FRNIYWFYYELNWQDIYAK 267 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 2027 

A DNA sequence (GBSx2138) was identified in S.agalactiae <SEQ ID 6273> which encodes the amino 
acid sequence <SEQ ID 6274>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0417 (Affirmative) < suco 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

50 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2028 

A DNA sequence (GBSx2139) was identified in S.agalactiae <SEQ ID 6275> which encodes the amino 
acid sequence <SEQ ID 6276>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.60 Transmembrane 305 - 322 ( 306 - 322) 



- Certainty=0. 2041 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

1 5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF28363 GB:AF224467 putative glycosyl transferase [Haemophilus 
ducreyi] 

Identities = 88/259 (33%) , Positives = 156/259 (59%) , Gaps = 11/259 (4%) 

20 Query: 7 VVIiAGDySYIRQIETTLKSLCVYHENLSIFIFNQDIPQEWFLAMKDRVGQTGNQIQDVKL 66 

+VLA + SY I TT+KS+ ++++++ ++ N+D P EWF + +++ + ++I D+K+ 
Sbjct: 10 IVLAANQSYSEYILTTIK3IYLHNKHIRFYLLNRDYPTEWFDILNNKLRKLNSEIIDIKV 69 

Query: 67 FHDHLSPKWENKIQjNHINY-MIYARYFIPQYISADTVLYLDSDLVVTTNLDNLFQISIjDN 125 
25 +D + K +HI+ T+ RYFI +1 D V+YLD+D+W +L L+Q + N 

Sbjct: 70 TNDTIK---NFKTYSHISSDTTFFRYFISDFIEQDKVIYLDADIWNGSLTELYQTDISN 126 

Query: 126 AYLAAVP ALFGLGYGFNAGVMVINNQRWRQENMTIKLIEKNQKEIENANEGDQTI 180 

+LAAV ++ + ENAG+4+INN++WR+ N+T + ++K I + + DQ+I 

30 Sbjct: 127 YFLAAVHDIISEKIYVNNHIFNAGMLLINNKKWREHNITQFCLSLSEKYINSLPDADQSI 186 

Query: 181 LNRMFENQVIYLDDTYNFQIGFD-MGAAIDGHKFIFDIPITPLPKIIHYISGIKPWQTLS 239 

LN +F+++ + L+ YN+ IG D + +++ D+ T +P IIHY + KPW + 

Sbjct: 187 LNLIFKDKWLKIJTOGYNYLIGTDYLFFKYGKTRYLEDLGET-IPLIIHYNTEAKPWLNIF 245 

35 

Query: 240 NMRLREVWWHYNLLEWSSI 258 

N R R ++W Y L W I 
Sbjct: 246 NTRFRNIYWFYYELNWQDI 264 

40 No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 6276 (GBS395) was expressed in E.coli as a His-fusion product SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 5; MW 47.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 83 (lane 8; MW 72kDa) and in Figure 
177 (lane 5; MW 72kDa). 

45 GBS395-GST was purified as shown in Figure 217, lane 7. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2029 

A DNA sequence (GBSx2140) was identified in S.agalactiae <SEQ ID 6277> which encodes the amino 
50 acid sequence <SEQ ID 6278>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .1633 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 2030 

A DNA sequence (GBSx2141) was identified in S.agalactiae <SEQ ID 6279> which encodes the amino 
acid sequence <SEQ ID 6280>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 36 - 52 ( 3S - 52) 

Final Results 

bacterial membrane Certainty=0 . 1065 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10243> which encodes amino acid sequence <SEQ ID 
10244> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:ARC77330 GB:AE000508 orf, hypothetical protein [Escherichia coli K12] 
Identities = 75/260 (28%) , Positives = 123/260 (46%) , Gaps = 22/260 (8%) 





6 ■ 


Sbjct: 


25 


Query: 


65 


Sbjct: 


85 






Sbjct: 


139 




176 


Sbjct: 


198 




231 


Sbjct: 


258 



YLSHPKYMSLRSWFRTGNFVNKDF TYYEVPMKLD VFDDEAFKKSSIDFYWA 116 

Y + ++ + R GN +4 D+ T ++P+++D +FD S FY+ A 

YTTKREFFDPLRFVRGGNLIDLDWLVEATASQMPLQMDTAARLFD SGKSFYMCA 138 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



A related GBS gene <SEQ ID 8959> and protein <SEQ ID 8960> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -5.16 
GvH: Signal Score (-7.5): -2.17 

Possible site: 44 
>» Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -0.16 threshold: C 
INTEGRAL Likelihood = -0.15 Transmembrane 
PERIPHERAL Likelihood =4.14 18 
modified ALOM score: 0.53 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 1065 (Affirmative) 

bacterial outside Certainty=0 . 0000 (Not Clear) < , 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < : 

The protein has homology with the following sequences in the databases: 

ORF0161K316 - 1050 of 1449) 

OMNl|NT01EC5264(37 - 289 of 369) hypothetical protein 
%Match =9.2 

%Identity =29.7 %Similarity =49.8 

Matches = 74 Mismatches = 118 Conservative Sub.s = 50 

273 303 333 363 393 420 450 



VGQRIPVTLGNIAPLSLRPFQPGRIALVCEGGGQRGIFTAGVLDEFMRAQFNPFDLYLGTSAGAQHLSAflCNQPGYARK 



YNKKYLSHPKYMSLRSWFRTGNFVNKDF TYYEVPMKLDVFDDEAFKKSSIDFYWATEMTSGKPEYFKIDSVFEQM 

= 1 = :: =1 Ih- h I = = h = :| : I 11= I I II > > = 

VIMRYTTKREFFDPLRFvRGGNLIDLDWLVSATASQMPLQMDT- -AARLFDSGK3FYMCACRQDDYAPNYF-LPTKQNWL 



EILRASSALPWSKM-VDWQGKKYLDGGLSDSIPVDFARGL^FDKLIVVMTRPLNYQKKPSS-GRLYKTL YRKYPN 

<:<lllll<l = I =1 11111=11=111 I I 1 = 1= II I 1= = I = I 

DVIRASSAIPGFTRSGVSLEGINYLDOTISDAIPVKEAARQGAKTLWIRWPSQMYYTPQWFKRMERWLGDSSLQPLvN 
180 190 200 210 220 ' 230 240 

960 990 1020 1050 1080 1110 1140 1170 

FVKTASNRYQQYNNSLEKVMSLEKTGDLFAIRPSKSLVIGRLEKNPDKLDSIYQLGMKDAKSVMPELNSYLMK*RKQYFS 

=1= h =11 = === =1 h =1 

LVQHHETSYRDIQQFIEKPPGKLRIFEIYPPKPLHSIALGSRIPALREDYKLGRLCGRYFLATVGKLLTEKAPLTRHLVP 
260 270 280 290 300 310 320 

SEQ ID 8960 (GBS394) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 4; MW 34.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 83 (lane 7; MW 60kDa). 

GBS394-GST was purified as shown in Figure 217, lane 6. 
Example 2031 

A DNA sequence (GBSx2142) was identified in S.agalactiae <SEQ ID 6281> which encodes the amino 
acid sequence <SEQ ID 6282>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 3004 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 2032 

A DNA sequence (GBSx2143) was identified in S.agalactiae <SEQ ID 6283> which encodes the amino 
acid sequence <SEQ ID 6284>. This protein is predicted to be transporter protein. Analysis of this protein 
sequence reveals the following: 



Possible site: 49 

»> Seems to have a cleavable N-te 

INTEGRAL Likelihood = -6.85 

INTEGRAL Likelihood = -6.74 

INTEGRAL Likelihood = 

INTEGRAL Likelihood -■ 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = -2.97 

INTEGRAL Likelihood = -2.87 

INTEGRAL Likelihood = -1.44 

INTEGRAL Likelihood = -0.64 



cm signal seq. 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 389 ( 370 - 

■ 184 ( 162 - 

• 275 ( 257 - 

• 302 { 285 - 



• 327 ( 310 • 
- 371 ( 355 - 

• 124 ( 108 - 



Certainty=0. 3 73 9 (Affirmative) < succ; 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



bacterial cytoplat 



iinty=0. 0000 (Not Clear) • < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22759 GB:U32790 transporter protein [Haemophilus influenzae 
Rd] 

Identities = 139/391 (35%) , Positives = 221/391 (55%) , Gaps = 4/391 (1%) 

Query: 6 INKNNWPJUjIAAIVASGTDDLNIMFLAFSMSTIITDLHLSAAQAGWIGTITNLGMLVGGL 65 

+N W+ALI + V G D +++ L F +S I DL+L+ AQ G + T T +G + GG+ 
Sbjct: 5 VNSYGWKALIGSAVGYGMDGFDLLILGFMLSAISADLNLTPAQGGSLVTWTLIGAVFGGI 64 

Query: 66 IFGLLADRYNKFKVFKWTILIFSIATGLVFFTTNLSYLYIMRFIAGIGVGGEYGIAIAIM 125 

+FG L+D+Y + +V WTIL+F++ TGL L I R IAGIG+GGE+GI +A+ 

Sbjct: 65 LFGALSDKYGRVRVLTWTILLFAVFTGLCAIAQGYWDLLIYRTIAGIGLGGEFGIGMALA 124 

Query: 126 AGIVPTNKMGRISSLNGIAGQVGSISSALLAGWLAPALGWRGLFLFGLLPIVLVLWMQFA 185 

A P + +S + QVG + +ALL L P +GWRG+FL G+ P + +++ 

Sbjct: 125 AEAWPARHRAKAASYVALGWQVGVLGAALLTPLLLPHIGWRGMFLVGIFPAFVAWFLRSH 184 

Query: 186 VDDKDILDQYNTDADDEPLDI SIKALFDTPvLATQSIjALMVMTTVQIAGYFGI*IMNW 241 

+ + +IQT + S + L +SL ++V+T+VQ GY+G+M W 

Sbjct: 185 LHEPEIFTQKQTALSTQSSFTDKLRSFQLLIKDKATSKISLGIWLTSVQNFGYYGIMIW 244 

Query: 242 LPTIIQTMjNVSVKNSSLVMIATILGMCLGI^VTX3QLLDKFGPRLVYGCFLLSSAICVYL 301 

LP + L S+ S LW T+ GM G+ +FGQL D+G + + FL + I + + 
Sbjct: 245 LPNFLSKQLGFSLTKSGLWTAVTVCGMMAGIWIFGQLADRIGRKPSFLLFQLGAVISIW 304 

Query: 302 FQFATTMPSMIIGGAWGFFWGMFAGYGAMITRLYPHHIRSTANNLILNVGRAIGGFSS 361 

+ T M++ GA +G FVNGM GYGA++ YP R+TA N++ N+GRA+GGF 
Sbjct: 305 YSQLTDPDIMLLAGAFLGMFWGMIiGGYGALMAFAYETEARATAQNVLFNIGRAVGGFGP 364 

Query: 362 VI IGMILDVSNVSMVMLFLASLYIVSFLSML 392 

V++G ++ + + LA +Y++ L+ + 
Sbjct: 365 VWGSWLAYSFQTAIALLAIIYVIDMLATI 395 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2377> which encodes the amino acid 
sequence <SEQ ID 2378>. Analysis of this protein sequence reveals the following: 





e: 39 

have a cleavable N-te 


~m signal seg. 










INTEGRAL 


Likelihood = -8.92 


Transmembrane 


1S8 


1B4 


162 


188 


INTEGRAL 


Likelihood = -5.41 


Transmembrane 


236 


302 


285 


306 


INTEGRAL 


Likelihood = -5.15 


Transmembrane 


372 


388 


362 


394 


INTEGRAL 


Likelihood = -3.45 


Transmembrane 


259 


275 


257 


276 


INTEGRAL 


Likelihood = -2.87 


Transmembrane 


311 


327 


306 




INTEGRAL 


Likelihood = -2.81 


Transmembrane 


55 




51 


71 


INTEGRAL 


Likelihood = -0.48 


Transmembrane 


108 


124 


108 


125 




Likelihood = -0.37 


Transmembrane 


84 


100 


84 


100 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0 .4567 (Affirmative) . 
■- Certainty=0. 0000 (Not Clear) < i 
-- Certainty=0. 0000 {Not Clear) < t 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 306/402 (76%) , Positives = 354/402 (87%) 

Query: 1 MSPIiNINKNJSIWRALIAAIVASGTDDLNIMFLAFSMSTIITDLHLSAAQAGWIGTITNLGM 60 

MS L+++ N RAL+AAI ASGTDDLN+MFLAFSMS+I+TDL LS Q GWI TITNLGM 
Sbjct: 1 MSTLSLDTTNKRALVAAIAASGTDDLNVMFLAFSMSSIMTDLGLSGTQGGWIATITNLGM 60 

Query: 61 LVGGLIFGLIjADRYNKFKVFKWTILIFSIATGLVFFTTNLSYLYIMRFIAGIGVGGEYGI 120 

LVGGL+FGLLADR++KFKVFKWTIL+FS+ATGL++FT +L YLY+MRFIAGIGVGGEYG+ 
Sbjct: 61 LVGGLLFGLLADRHHKFKVFKWTILLFSVATGLIYFTQSLPYLYLMRFIAGIGVGGEYGV 120 

Query: 121 AIAI^GIVPTNMGRISSI^GIAGQVGSISSALLAGWLAPALGWRGLFLFGLLPIVLVL 180 

AIAIMAGIVP KMGR+SSLNGIAGQ+GSISSALLAGWLAP+LGWRGLFLFGLLPI+LV+ 
Sbjct: 121 AIAIMAGIVPPEKMGr^SSLNGIAGQMSISSMiUM3WLAPSLGWRGLFLFGLLPILLVI 180 

Query: 181 WMQFAVDDKDILDQYOTDADDEPLDISIKALFDTPV1ATQSLALMVMTWQIAGYFGMMN 240 

WM A+DD+ IDY+++ IILFTL Q+LALMVMTTVQIAGYFGMMN 
Sbjct: 181 raTLAIDDQKIVTOHYGQEEEECSQPIKIWELFKTKSLTAQTIjALiyr^TWQIAGYFGMMN 240 

Query: 241 WLPTIIQTNLNVSVKNSSLWMIATILGMCLGMLVFGQLLDKFGPRLVYGCFLLSSAICVY 3 00 

WLPTI IQT+LN+SVK+SSLWM+AT1+GMCLGML FGQLLD FGPRL+Y FLL+S+ICVY 
Sbjct: 241 WLPTI IQTSLNDSVKSSSLWMVATIVGMCLGMLYFGQLLDCFGPRLIYSLFLLASSICVY 300 

Query: 301 LFQFATTMPSMIIGGAWGFFVNGMFAGYGAMITRLYPHHIRSTANNLILNVGRAIGGFS 360 

LFQFA +M SM+IGGA+VGFFVNGMFAGYGAMITRLYPHHIRSTANN4-ILNVGRA4GGFS 
Sbjct: 301 LFQFANSMASMVIGGAIVGFFVNGMFAGYGAMITRLYPHHIRSTANNVILNVGRALGGFS 360 

Query: 361 SVIIGMILDVSNVSMVMLFLASLYIVSFLSMLSIKQLKRQKY 402 

SV IG ILD S +SMVM+FLASLY++SF +M SI QLK ++Y 
Sbjct: 361 SVAIGS I LDASG I SMVMI FLASLYVI S FGAMW S I GQLKAERY 402 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines o 



Example 2033 

A DNA sequence (GBSx2144) was identified in S.agalactiae <SEQ ID 6285> which encodes the amino 
acid sequence <SEQ ID 6286>. This protein is predicted to be leucyl-tRNA synthetase (leuS). Analysis of 
this protein sequence reveals the following: 



3 N- terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .3481 (Affirmative) < suco 

bacterial membrane — - Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10241> which encodes amino acid sequence <SEQ ID 
10242> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00259 GB:AF008220 leucine tRNA synthetase [Bacillus subtilis] 
Identities = 569/335 (68%) , Positives = 666/835 (79%) , Gaps = 42/835 (5%) 

Query: 10 YNHKEIEPKWQAFWADNHTFKTGTDASKPKFYALDMFPYPSGAGLHVGHPEGYTATDILS 69 

+ HKEIE KWQ +W +N TF T + K KFYALDMFPYPSGAGLHVGHPEGYTATDILS 
Sbjct: 3 FQHKEIEKKWQTYWLENKTFATLDNNEKQKFYALDMFPYPSGAGLHVGHPEGYTATDILS 62 

Query: 70 RFKRAQGHNVLHPMGWDAFGLPAEQYAMDTGNDPAEFTAENIANFKRQINALGFSYDWDR 129 

R KR QG++VLHPMGWDAFGLPAEQYA+DTGNDPA FT +NI NF+RQI ALGFSYDWDR 
Sbjct: 63 RMKRMQGYDVLHPMGWDAFGLPAEQYALDTGNDPAVFTKQNIDNFRRQIQALGFSYDWDR 122 

Query: 13 0 EVNTTDPNYYKWTQWIFTKLYEKGLAYFJffiVPVNWVEELGTAIANEEVLPDGTSERGGYP 189 

E+NTTDP YYKWTQWIF KLYEKGLAY EVPVNW LGT +ANEEV+ DG SERGG+P 
Sbjct: 123 EINTTDPEYYKWTQWIFLKLYEKGLAYVDEVPVMCPALGTvIANEEVI-DGKSERGGHP 181 

Query: 190 VVRKPMRQWMLKITAYAERLLEDLEEvDWPESIKDMQRNWIGKSTGAWTFKVKDTDKDF 249 

V R+PM+QWMLKITAYA+RLLEDLEE+DWPESIKDMQRNWIG+S GA+V F + D F 
Sbjct: 182 VERRPMKQWMLKITAYADRLLEDLEELDWPESIKDMQRNWIGRSEGAHVHFAIDGHDDSF 241 

Query: 250 WFTTRPDTLFGATYAVIAPEHALVDAITTADQAEAVAEYKRQASLKSDLARTDLAKEKT 309 

TVFTTRPDTLFGATY VLAPEHALV+ ITTA+Q EAV Y ++ KSDL RTDLAK KT 
Sbjct: 242 TVFTTRPDTLFGATYTVLAPEHALVENITTAEQKEAVEAYIKEIQSKSDLERTDLAKTKT 3 01 

Query: 310 GvTTrGAYAINPWGICEIPWIADYVLAEYGTGAlMAVPAI-IDERDWEFAKQFNLDIIPVLE 369 

GV+TGAYAINPVNG+++P+WIADYVLASYGTGA+MAV? HDERD+EFAK F L + V++ 
Sbjct: 302 GVFTGAYAINPVNGEKLPIWIADYVLASYGTGAVMAVPGHDERDFEFAKTFGLPVKEVVK 361 

Query: 370 GGNVEEAAFTEDGLHINSDFLDGLDKAAAlAKIOTEWLEAEGvGlTOKVTYRLRDWLFSRQR 429 

GGNVEEAA+T DG H+NSDFL+GL K AI K++ WLE G +KVTYRLRDWLFSRQR 
Sbjct: 362 GGNVEEAAYTGDGEHVNSDFLNGLHKQEAIEKVIAWLEETKNGEKKVTYRLRDWLFSRQR 421 

Query: 430 YWGEPIPIIHWEDGTSTAVPESELPLVLPVTKDIRPSGTGESPLANLTDWLEVT-REDGV 488 

YWGEP1P+ IHWEDGTSTAVPE ELPL+LP T +I+PSGTGESPLAN+ +W+EVT E G 
Sbjct: 422 YWGEPIPVIHWEDGTSTAVPEEELPLILPKTDEIKPSGTGESPLANIKEWVEVTDPETGK 481 

Query: 489 KGRRETWTMPQmGSSSm r LRYIDPHNTEKLMEELIjKQWLPVDIYVGGAEHAVLHLLYA 548 

KGRRETNTMPQWAGS WY+LRYIDPHK ++LA E L++WLPVD+Y+GGAEHAVLHLLYA 
Sbjct: 482 KGRRETOTMPQ^GSCWYFLRYIDPHNPDQIiASPEKLEKWLPVDMYIGGAEHAVLHLLYA 541 

Query: 549 RFTOKVIjYDLGWPTKEPFQKLFNQGMILGTSYRDSRGALVATDKVEKRDGSFFHVETGE 608 

RFWHK LYD+GWPTKEPFQKL+NQGMILG E E 

Sbjct: 542 RFWHKFLYDIGWPTKEPFQKLYNQGMILG ENNE 575 

Query: 609 ELEQAPAKMSKSLIOJVVNPDDVvEQYGADTLRVYEMFMGPLDASIAWSEEGLEGSRKFLD 668 

KMSKS NWNPD++V +GADTLR+YEMFMGPLDASIAWSE GL+G+R+FLD 
Sbjct: 576 KMSKSKGNWNPDEIVASHGADTIiRLYEMFMGPLDASIAWSESGLDGARRFLD 628 

Query: 669 RVYRLI TTKE ITEENSGALDKVYNETV^VTEQ VDQMKFNTAIAQLMVFYNAAN 722 

RV+RL +1 E L++VY+ETV VT+ + ++FNT I+QLMVF+N A 

Sbjct: 629 RvTOLFIEDSGEI^GKIWGAGETLER\ r YHETV^lKVTDHYTGLRFNTGISQLMVFINEAY 688 

Query: 723 KEDKLFSDYAKGFVQLIAPFAPHLGEELWQVLTASGQSISYVPWPSYDESKLVENEIEIV 782 

K +L +Y +GFV+L++P APHL EELW+ L SG +I+Y WP YDE+KLV++E+EIV 
Sbjct: 689 KATELPIu^YMEGFVTa^LSPVAPHIAEELWEKLGHSG-TIAYEAWPVYDETIQjVDDEVEIV 747 



Query: 783 VQIKGKVKAKLWAKDLSREELQDLALAM3KVQAEIAGKDIIKVIAVPNKLVNIV 837 
65 VQ+ GKVKAKL V D ++E+L+ LA A+EKV+ ++ GK I K+IAVP KLVN1V 
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Sbjct: 748 VQLNGKVKAKLQVPMATKEQLKQLAQMEKVKKQLEGKTIRKIIAVPGKLVWIV 802 

A related DNA sequence was identified in S.pyogenes <SEQ ID 6287> which encodes the amino acid 
sequence <SEQ ID 6288>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4358 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 815/833 (97%) , Positives = 827/833 (98%) 

Query: 7 MTFYNHKEIEPKWQAFWADNHTFKTGTDASKPKFYALDMFPYPSGAGLHVGHPE'GYTATD 66 

MTFY+H IEPKWQAFWADNHTFKTGTDASKPKFYALDMFPYPSGAGLHVGHPEGYTATD 
Sbjct: 1 MTFYDHTAIEPKWQAFWADNHTFKTGTDASKPKFYALDMFPYPSGAGLHVGHPEGYTATD 60 

Query: 67 ILSRFKRAQGHNVLHPMGWDAFGLPAEQYAI^DTGNDPAEFTAENIANFKRQINALGFSYD 126 

ILSRFKRAQGHN+LHPMGWDAFGLPAEQYAMDTGNDPAEFTAENIANFKRQINALGFSYD 
Sbjct: 61 1LSRFKRAQGHN1LHPMGWDAFGLPAEQYAMDTGNDPAEFTAENIANFKRQINALGFSYD 120 

Query: 127 WDREVmTDPNYYKIirrQWIFTKLYEKGIiAYEAEVPVNWvEELGTAIANEEVLPDGTSERG 186 

WDREVNTTDPNYYKWTQWIFTKLYEKGIAYEAEVPVNWVEELGTAIANEEVLPDGTSERG 
Sbjct: 121 TOREVNTTDPNYYKWTQWIFTKLYEKGLAYFJffiVPVNm^ELGTAIANEEVLPDGTSERG 180 

Query: 187 GYPvVRKPMRQWILKITAYAERLLEDLEEVDWPESIKDMQRNWIGKSTGANVTFKVKDTD 246 

GYPVTOKPMRQWMLKITAYAERLLEDLEEVDMPESIKDMQRNWIGKSTGANVTFKVKDTD 
Sbjct: 181 GYPVTOKPMRQWMLKITAYAERLLEDLEEVDWPESIKDMQRIOTIGKSTGANVTFiWKDTD 240 

Query: 247 KDFWFTTRPDTLFGATYAVLAPEHALVDAITTADQAEAVAEYKRQASLKSDIiARTDLAK 306 

KDFWFTTRPDTLFGATYAVLAPEHALVDAITTAIXJAEAVA+YKROASLKSDliARTDLAK 
SbjCt: 241 KDFWFTTRPDTLFGATYAVIAPEHALVDAITTADQAEAVAKYKRQASLKSDLARTDrjAK 300 

Query: 307 EKTGVWTGAYAINPWGKEIPWIADYVLASYGTGAIMAVPAHDERDWEFAKQFNLDIIP 366 

EKTGVWTGAYAINPVNG E+PVWIADYVIiASYGTGAIMAVPAHDERDWEFAKQF LDIIP 
SbjCt: 301 EKTGVOTGAYAINPVNGNEMPWIADYVliASYGTGAIMAVPAHDERDWEFAKQFKLDIIP 360 

Query: 367 VLEGGIWEEAAFTEDGLHINSDFLDGLDKAAAIAKMVEWLEAEGVGNEKArrYRLRDWLFS 426 

VLEGGNVEEAAFTEDGLHINS FLDGLDKA+AIAKMVEWLEAEGVGNEICVTYRLRDWLFS 
Sbjct: 361 VLEGGNVEFAAFTEDGLHINSGFLDGLDKASAIAKMVEWLEAEGVGNEKVTYRLRDWLFS 420 

Query: 427 RQRYWGEPIPIIHWEDGTSTAVPESELPLVLPVTKDIRPSGTGESPLANLTDWLEVTRED 486 

RQRYWGEPIPIIHWEDGTSTAVPESELPLVLPVTKDIRPSGTGESPLAN+TDWLEVTRED 
Sbjct: 421 RQRYWGEPIPIIfflffiDGTSTAVPESELPLVLPWKDIRPSGTGESPIANVTDWLEVTRED 480 

Query: 487 GVKGRRETNTMPQWAGSSWYYLRYIDPHfrTEKIMEELLKQWLPvDIWGGAEHAvLHLL 546 

GVKGRREIOTMPQWAGSSWYYLRYIDPHNTEKLADEELLKQWLPTOIWGGAEHAVLHLL 
Sbjct: 481 GVKGRRETNTMPQWAGSSWYYLRYIDPHOTEKLADEELLKQWIjPTOIYVGGAEHAVLHLL 540 

Query: 547 YARFWHECVLYDLGWPTKEPFQKLFNQGMILGTSYRDSRGALVATDKVEKRDGSFFHVET 606 

YARFWHKVLYDLGWPTKEPFQKLFNQGMI1GTSYRDSRGALVATDKVEKRDGSFFHVET 
Sbjct: 541 YARFWHKVLYDLGWPTKEPFQKLFIJQGMIIjGTSYRDSRGALVATDKVEKRDGSFFHVET 600 



Query: 667 LDRVYRLITTKEITEENSGALDKVVIET-' 1^ YDQMKFNTAIAQLMVFVNAANKEDK 726 

LDRVYRLITTKEITEENSG7ADKVYNETO1^VTEQVDQMKFOTAIAQLWFVNAANKEDK 
Sbjct: 661 LDRWRLITTKEITEF^SGALDK\nflffiT^/KAVTEQ\T3QMKFNTAIAQLWFVNAANECEDK 720 
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Sbjct: 721 LFSDYAKGFVQLIAPFAPHLGEELWQALTASGESISYVPWPSYDESKLVENDVEIWQIK 780 

Query: 787 GCTKAKLWAKDLSREELQDLAGAMEKVQAEIAGKDIIiO/IAVPNKLVNIVVK 839 
GK^KAKLWAKDLSREELQ++AIJ^K^QAEIAGK11IIKVIAVPNKLVNIV+K 
5 Sbjct: 781 GK^KAKLVVAKDLSREELQEVALAMEKVQAEIAGKDIIKVIAVPNKLWIVIK 833 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 2034 

10 A DNA sequence (GBSx2145) was identified in S.agalactiae <SEQ ID 6289> which encodes the amino 
acid sequence <SEQ ID 6290>. This protein is predicted to he KIAA1074 protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 35 

>>> Seems to have an uncleavable N-term signal seq 

15 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

A related GBS nucleic acid sequence <SEQ ID 896 1> which encodes amino acid sequence <SEQ ID 8962> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
SRCFLG: 0 
25 McG: Length of UR: 19 

Peak Value of OR: 2.86 
Net Charge of CR: 4 
McG: Discrim Score: 10.27 
GvH: Signal Score (-7.5): -3.61 
30 Possible site: 31 

>>> Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 0 value: 2.12 threshold: 0.0 
PERIPHERAL Likelihood =2.12 7 
35 modified ALOM score: -0.92 

*** Reasoning Step: 3 

Final Results 

40 bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

45 No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8962 (GBS117) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 29 (lane 8; MW 22.5kDa). 

GBS1 17-His was purified as shown in Figure 200, lane 7. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Exanxple 2035 

A DNA sequence (GBSx2146) was identified in S.agalactiae <SEQ ID 629 1> which encodes the amino 
acid sequence <SEQ ID 6292>. This protein is predicted to be YirC (resE). Analysis of this protein 
sequence reveals the following: 

5 Possible site: 28 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.88 Transmembrane 177 - 193 ( 173 - 196) 
INTEGRAL Likelihood = -4.09 Transmembrane 10 - 26 ( 5 - 29) 

10 Final Results 

bacterial membrane Certainty=0. 5352 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15292 GB:Z99120 similar to two- component sensor histidine 
kinase [YvqA] [Bacillus subtilis] 
Identities = 108/379 (28%), Positives = 193/379 (50%), Gaps - 33/379 (8%) 



Sbjct: 

Sbjct: 
Query: 
Sb j ct : 



92 DOTKKESHDIIRYLTQKRLWQISKEKDGMFVTIKKKTYYVMTKDYSGILVDGSIKKVPKA 151 

+N + S + L+ + ++ K D KKK Y + D +G V IKK 
86 ENEEASSDKDLSILSSSFIHKVYKLADKQ- -EAKKKRY SADVNGEKVFFVIKKGLSV 140 

152 QSOLFHVINFS DITYTQHLITKINHFLIVILVLTYIPMLFIMRKTFTGIRESIQ 205 

Q +++++ D+ YT L ++ + V+++L++IP +++ + + + 

141 NGQSAMMLSYALDSYRDDLAYT— LFKQLLFIIAWILLSWIPAIWLAKY LSRPLV 194 

206 SVQTYISSLWKNQGNHQSSQKEIVFSDFDPLLLESQEMANRIYQAEESQRNFFQNASHEL 265 

S + ++ + +++ K + L +EM ++ Q +E++R QN SH+L 

195 SFEICHVKRI--SEQDWDDPVKVDRKDEIGKLGErriEEMRQKLVQKDETERTLLQNISHDL 252 

266 RTPLMSIQGYTEGVQEGII DAELAHSVILQESKKMKQLVDDIILLSKLD- -SNLSDQ 320 

+TP+M I+GYT+ +++GI D E VI E+ K+++ + D++ L+KLD + Q 
253 KTPV1WIRGYTQSIKDGIFPKGDLENTVDVIECEALKLEKKIKDLLYLTKLDYLAKQKVQ 312 

Query: 321 KDEFSLNELIiNSIIAYFKPLiANKQKISITYRPDKHEKLLK-GNEELIQRAIHNILSNALR 379 

D FS4- E+ +1 K A K+ +++ D E +L G+ E + + NIL N +R 
Sbjct: 313 HDMFSIVEVTEEVIERLK-WARKE LSWEIDVEEDILMPGDPEQWNKLLENILENQIR 368 

Query: 380 YAVSHIEISYT NQKLTISNDGPAISKEDLPYIFDRFYKGHGGQTGIGLAMTKEIIK 435 

YA + IEIS N +TI NDGP I EL +++ F KG G+ GIGL++ K 1+ 

Sbjct: 369 YAETKIEISMKQDDRNIVITIKNDGPHIEDSMLSSLYEPFNKGKKGEFGIGLSIVKRILT 428 

Query: 436 QHHGNI IAESDSTSTTFTI 454 

H +1 E+D T ++ I 
Sbjct: 429 LHKASISIENDKTGVSYRI 447 

There is also homology to SEQ ID 1178. 

SEQ ID 6292 (GBS279) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 7; MW 54.5kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 58 (lane 6; MW 79.4kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 2036 

A DNA sequence (GBSx2147) was identified in S.agalactiae <SEQ ID 6293> which encodes the amino 
acid sequence <SEQ ID 6294>. This protein is predicted to be two-component response regulator (mtrA). 
Analysis of this protein sequence reveals the following: 

5 Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1706 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10239> which encodes amino acid sequence <SEQ ID 
1024O was also identified. 



15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05663 GB:AP001513 two-component response regulator [Bacillus halodurans] 
Identities = 87/220 (39%), Positives = 124/220 (55%), Gaps = 4/220 (1%) 

Query: 11 IYFADDEKNIRDLWPFLEHDGFTVRAFETGDLLLEAYKNQKPDLVII J DI^MPG , ^NGLDV 70 
20 I DDE ++R+LV +L +GF V ETGD ++ + + DLV+LD+MM +G 

Sbjct: 7 ILIVDDELDI^ELVTSYLRKEGFAVYTAETGDEAIKRLEQEPMDLVVIJDvMMDEMDGFTA 66 

Query: 71 MKS I RQYDN1 P 1 1 MLTARDSDVDFITAFNLGTDDYFTKPFSPI KLSLHVKALFKRLDEKA 130 
K IR + IPIIMLTAR + D + +G DDY KPFSP +L ++ +R 
25 Sbjct: 67 CKEIRAFSQIPIIMLTARGGEDDKVMGLQIGADDYIVKPFSPRELVARIEVAIiRRTQGIQ 126 



Query: 131 IKNDTQYQFLDLTLDTEKRIALLSNEEMPLTKT3FDFLLVLIEKPETAFSRETLLKRIWG 190 

+DT Y+F +L + R ++ +E+ LTK E+D L+ L+E F+RE L +R+WG 

Sbjct: 127 C^TODTGYRFNELRIQPSGRKVFVNGQEISLTKKEYDLLVFLLEHRGRVFTREHLHDRLWG 186 

Query: 191 FDDIES--RAVDDTIKRLRKKFKQYHSQVSIKTVWGYGFK 228 

D+ RVDIKLRKK + IKTVWG G+K 
Sbjct: 187 MDTQQGTLRTVDTHI KTLRLKLKP - - ADRFI KTVWGVGYK 224 

There is also homology to SEQ ID 3260. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 2037 

A DNA sequence (GBSx2148) was identified in S.agalactiae <SEQ ID 6295> which encodes the amino 
40 acid sequence <SEQ ID 6296>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.18 Transmembrane 1558 -1584 (1568 -1585) 
INTEGRAL Likelihood = -0.16 Transmembrane 338 - 354 ( 338 - 354) 

45 

Final Results 

bacterial membrane Certainty=0 .1871 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

50 

A related GBS nucleic acid sequence <SEQ ID 10237> which encodes amino acid sequence <SEQ ID 
10238> was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAG09771 GB:AF243528 cell envelope proteinase [Streptococcus thermophilus] 
Identities = 797/1594 (50%), Positives = 1056/1594 (66%), Gaps . 39/1594 (2%) 

Query: 21 IWTKQRFSIRKYKLGAVSVLLGTLFFLGGITNVAflDSVINKPSDIAVBQQVKDSPTS-IA 79 

M K+ FS+RKYK+G VSVLLG +F G +VAAD + + + VEVD+SA 
Sbjct: 1 MKKICETFSLRKYKIGWSVLIjGAVFLFAGAPSVAAEELTSLV-ETKVEATVPDAIVSESA 59 

Query: 80 NETPT™--TSSALASTAQDNLVTKANNSPTETQPVAESHSQATETFSPVANQPVESTQE 137 

+E+P +++ +T+ D T ++ + S + ET P P S ++ 

Sbjct: 60 SESPWEELVDTSVEATSTDVTTTDNEEKTPGSEALBNSANTEVETTQPAVETPAISEKK 119 

Query: 138 VSKTPLTKQNLAVKSTPAISKET- - PQNIDSNKIITVPKVWNTGYKGEGTWAIIDSGLD 195 

V + K ++A ++T +4-E PQWIDSN IITVPKVW +GYKGEGTWAI I DSGLD 
Sbjct: 120 VEEEE- -KLSVADETTAITNQEEAKPQNIDSNTIITVPKVVJYSGYKGEGTWAIIDSGLD 177 

Query: 196 INHDALQLNDSTKAKYQKTEQQr^JAAKAKAGINYGKKYNNK^IFGHNYVDVOT 255 

++HD L ++D + AKY++E+++ AAK AGI YG+W+N+KV+FG+NYVDVNT LKE 
Sbjct: 178 VDHDVl.HISDLSTAKYKSEKEIFAAKEAAGITYGEWFKDKOTFGYNYvT)vim7LKEEDKR 237 



+L+YGVAPEAQVMFMRVFSD K TG ALYVKAIEDAVKL 
Sbjct: 238 SHGMHVTSIATGNPTQPVAGQLMYGVAPFAQVMEM^VFSDLKATTGAALYVKAIEDAVKL 297 

Query: 316 GADSINLSLGGANGSLWADDRLIKALEMARLAGVSWIAAGNDGTFGSGASKPSALYPD 375 

GADSINLSLGGANGS+VN ++ + A+E AR AGVSWIAAGNDGTFGSG S PSA YPD 
Sbjct: 298 GADSINLSLGGANGSVVNMMNVTAAIEAARRAGVSWIAAGNDGTFGSGHSNPSADYPD 357 

Query: 376 YGLVGSPSTARFAISVASYNNTTLVNKVFNIIGLENNRNLNNGIAAYADPKVSDKTFEVG 435 

YGLVG+PSTA +AI SVAS YNNTT+ +KV NIIGLENN +LN G +++ +P+ S FE+G 
Sbjct: 358 YGLVGAPSTAHDAISVASYNimVGSKyiNIIGLENNADLMYGKSSFDNPEKSPVPFEIG 417 

Query: 436 KQYDWFVGKGNDNDYKDKTL^raKIALIERGDITFTKKOTNAIKlGAVGAIIFNNKAGEA 495 

K+Y+YV+ G G +D+ L GK+ALI+RG ITF++K+ HA GAVG +IFN++ GEA 
Sbjct: 418 KEYEYVYAGIGQASDFDGLDLTGKLALIKRGTITFSEKIANATAAGAVGWIFNSRPGEa 477 

Query: 496 

Sbjc 

Query: 556 DGQLKPDLSAPGGSIYAAINDNEYDMMSGTSMASPHVAGATALVKQYLLKEHPELKKGDI 615 

DG+LKPDL+APGG+ 1 YAAINDN+Y M GTSMASPHVAGA LVKQYLL +P +1 
Sbjct: 538 DGELKPDLARPGGAIYAAIITONDYAM^QGTSMASPHVAGAAVLVKQYLLATYPTKSPQEI 597 

Query: 616 ERTVKYLLMSTAKAHLNKDTGAYTSPRQQGAGIIDVAAAVQTGLYLTGGENNyGSVTLGN 675 

E VK+LLMSTAKAH+NK+T AYTSPRQQGAGIID AAA+ TGLYLT GE4 YGS+TLGN 
Sbjct: 598 EALVKHLLMSTAKAHVNKETTAYTSPRQQGAGIIDTAAAISTGLYLT-GEDGYGSITLGN 656 



Sbjct: 657 VEDTFSFTVTLHNITITODKTI^STQLTTDTAQKRIDHLGSTSISRDSWRKVTVKANSST 716 

Query: 736 TITIDIDVSKYHDMLKKVMPNGYFLEGYVRFTDPVDGGEVLSIPYVGFKGEFQNLEVLEK 795 

T+TI++D S + + L +M NGY+LEG+VRFTD D G+++SIPYVGF+GEFQNL VLE+ 
Sbjct: 717 TVT I NVDAS S FAEELTGLM KNGYYLEGFVRFTDVADDGD I VS I PYVGFRGE FQNLAVLEE 776 

Query: 796 SIYKLVANKEKGFYFQP- -KQTMEVEGSEDYTAIMTTSSEPIYSTDGTSPIQLKALGSYK 853 

IY L+A+ + GFYF+P Q N V S YT L+T S+E IYSTD S +K LG++K 
Sbjct: 777 PIYNLIADGKGGFYFEPVTAQPOTVDISHHYTGLVTGSTELIYSTDKRSDSAIKTLGTFK 836 

Query: 854 SIDGKMILQLDQKGQPHIAISPHDDQNQDAVAWGVFLRMFNl^RAKvYRADDVNLQKPL 913 

+ G ++L+M3+ G+PHLAISPN D NQD++ KGVFLRN+ +L A VY ADD PL 
Sbjct: 837 HKAGYFVLELDESGKPHLAISPNGDDNQDSLVFKGVFLRNYTDLVASVYAADDTERTNPL 896 



Query: 974 QQMVFDITLDRQAPTLTTATYDKDRRIFKARPAVEKGESGIFREQVFYLKKDKDGHYNSV 1033 
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Query: 1034 LRQQGEDC3ILVEDNKVFIKQEECDGSFILPKEVNDFSHVYYTVEDYAGNLVSAKLEDLINI 1093 
5 + V DNKVF+ Q DGSF LP ++ D S YYTVEDYAGN+ K+E+LI+I 

Sbjct: 1016 PSLLI<NGDVWSDNI<VFVAQNDDGSFTLPLDIMISKFYYTVEDYAGNISYEICVENLISI 1075 

Query: 1094 GNKNGLVNVKVFSPELNSNVDIDFSYSVKDDKGNIIKK-QHHGKDLNLLKLPFGTYTFDL 1152 
GN+ GLV V + + NS V I FSYSV D+ G 1+ + + D ++LKLPFGTYTFDL 
10 Sbjct: 1076 GlffiKGLVTVNILDKDTNSPVPILFSYSVTDETGKIVAELPRYAGDTSVLKLPFGTYTFDL 1135 

Query: 1153 FLYDEERAISn^ISPKSVTVTISEKDSLKDVLFKVNLLKKAALLVEFDKLLPKGATVQLVTK 1212 

FLYD E ++L VTI E +S +V F V L KA LL++ D LLP G+T+QLVT 

Sbjct: 1136 FLYDTEWSSLAGETKAWTILe:DNSTAEVNFYVTLKDKANLLIDIDALLPSGSTIQLVTA 1195 

15 

Query: 1213 TNTVVDLPKATYSPTDYGKNIPVGDYRLNVTLPSGYSTLENLDDLLVSVKEDQVNLTKLT 1272^ 

+ LP A YS TDYGK +PVG Y + TLP GY LE LD V+V +Q N+ KLT 
Sbjct: 1196 DGQAIQLPNAKYSKTDYGKFVPVGTYTILPTLPEGYEFLEELD VAVLANQSNVKKLT 1252 

20 Query: 1273 LINKAPLINALAEQTDIITQPVFYNAGTHLKNNYLAMLEKAQTLIKNRVEQTSIDKAIAA 1332 

LINK L +AE + +YNA L+ Y LE A + N+ Q +D+A+A+ 

Sbjct: 1253 LINKVALKELIAEIAGLEETARYYNASPELQTAYAKALEDANAVYANKHNQAQVDSALAS 1312 

Query: 1333 LRESRQALNGKETDTSLiLAKAI LAETE I KGNYQFVNAS PLSQSTYINQVQLAKNLLQKPN 1392 
25 L +R+ LNG+ TD L + T 4- N+ + NA Q Y V+ A+ +L + N 

Sbjct: 1313 LVAAREQLNGQATDKEKLIAEVSNYTPTQANFIYYNAENTKQIAYDTAVRSAQLVLNQEN 1372 

Query: 1393 OTQSEVDKALEmDIAKNQLNGHETDYSGLHHMIIKANVLKQTSSKYQNASQFAKENYNN 1452 
VTQ+ V++AL +L AK L+G +TD S L + ++VLK T +KY NAS+ K+ Y+ 
30 Sbjct: 1373 OTQAVVNQAI^IjLAAKANLDGQKTDISALRSAVSVSSVLKATDAKYLNASENVKQAYDQ 1432 

Query: 1453 LIKKAELLLSNRQATQAQVEELLNQIKATEQELDG RDRVSSAENYSQSLNDNDSLN 1508 

++ A+ +L + A+QA V++ L + + + ELDG + N + D ++ 

Sbjct: 1433 AVEAAKAILVDESASQASVEQAIA\TLTSAQAELDGVATSTNDAKEPANTATDKKDEGTVT 1492 

35 

Query: 1509 TTPIN PP NQPQALIFKKGMTKESEVAQKRVLGVTSQTDNQKVKTNKL 1555 

PI+ PP N I +K + + + L + + NQ+ + +L 

Sbjct: 1493 PPPIDSEIVDVQAPPVKDTGNSEHVPIGQK-PNPQPTLPRPVTLQASLSSPNQEKQVTQL 1551 

P TGE+ K 
Sbjct: 1552 PNTGENDTK- - 

A related GBS gene <SEQ ID 8963> and protein <SEQ ID 8964> were also identified. Analysis of this 
45 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
SRCFLG: 0 

McG: Length of UR: 1 

Peak Value of UR: 2.55 
50 Net Charge of CR: 4 

McG: Discrim Score; 2.60 
GvH: Signal Score (-7.5): -0.78 

Possible site: 35 
>>=■ Seems to have a cleavable N-term signal seq. 
55 Amino Acid Composition: calculated from 36 

ALOM program count: 1 value: -0.16 threshold: 0.0 

INTEGRAL Likelihood = -0.16 Transmeiribrane 318 - 334 ( 318 - 334) 
PERIPHERAL Likelihood = 2.54 1161 
modified ALOM score: 0.53 
60 icml HYPID: 7 CFP: 0.106 

*** Reasoning Step: 3 

Final Results 

65 bacterial membrane - 

bacterial outside - 



WO 02/34771 



PCT/GB01/04789 



bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 
LPXTG motif: 1535-1539 

The protein has homology with the following sequences in the databases: 

50.5/67.5% over 1583aa 

Streptococci! 

s thermophilic 

GP| 9963932 | cell envelope proteinase Insert characterized 
ORF01603 (361 - 5070 of 5370) 

GP|9963932|gb|AAG09771.l|AF243528_l|AF243528(l - 1584 of 1585) cell envelope pro 
teinase {Streptococcus thermophilic} 
%Match =41.2 

%Identity = 50.4 %Similarity = 67.4 

Matches = 794 Mismatches = 498 Conservative Sub.s = 267 



KNALGTVLNLPQNNL* * KFRKL* KILI FYVLIVFVI IMLQEKE I FMKTKQRFSIRKYKLGAVS VLLGTLPFLGGITNVAA 
I h 11=1111=1 llllll = l = = I =111 
MKKKETFSLRKYKIGTVSVLLGAVFLFAGAPSV7AA 



DSVINKPSDIAVEQQVKDSPTS - IANETPT- -NNTSSALASTAQDNLVTKAHHSPTETQPVAESHSQATETFSPVANQPV 
| : = || | |: | |:|:| ::: =|: | | :: : | : || I I 

DE-LTSLVETKVFAWPDAIVSESASESPVVEELVDTSVFATSTDVTTTDNEEETPGSF^ENSANTEVETTQPAVETPA 



130 140 150 160 170 180 190 

960 990 1020 1050 1080 1110 1140 1170 

YQNEQQMNAAKAKAGINYGKWYNNKVI FGHNYVDYNTELKEVKSTSHGMHVTS IATANPSKKDTNELI YGVAPEAQVMFM 
l = :| = = = Ml III I I s | : | : | | : | | : | | | | | | | ||| lllllllllll ll = = =1 = 111111111111 

YKSEKEIEKAKEAAGITYGEWFiroKOTFGY^ 

210 220 230 240 250 260 270 

1200 1230 1260 1290 1320 1350 1380 1410 

RVFSDEJCRGTGPALWPCAIEDAVKLGADSINLSLGGAWGSIjVNADDRLIKALEMARLAGVSVVIAAGNDGTFGSGASKPS 

iiiii i ii 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 = 1 1 := ■- 1 = 1 ii ii ii i minimi 1 1 i n 



1440 1470 1500 153,0 1560 1590 1620 1650 

ALYPDYGLVGSPSTARFAISVASYNOTTLVNKVFNIIGLENN^ 

i iiiiiiii = iiii =iiiiiiimi= =n mum =11 i = = = =1= i 11 = 11 = 1 = 11= 1 1 =1 

ADYPDYGLVGAPSTAHDAISVASYWNTWGSKVIWIIGLENNADLNYGKSSFDNPEKSPVPFEIGKEYEYVYAGIGQASD 



1680 1710 1740 1770 1800 1830 1860 1890 

YKDKTLNGKIALIERGDITFTKKWWAINHGAVGAI I FNNKAGEANLTMSLDPSASAI PAI FTQKEFGD VLAKNNYKI VF 

= i imimi iii = = i= ii mi =m = = nm=i n i mm im n mn i 

FDGLDLTGKIALIKRGTITFSEKIANATAAGAVGWIFNSRP3EANVSMQLDDTAIAIPSVFIPLEFGEALAANSYKIAF 
450 460 470 480 490 500 510 

1920 1950 1980 2010 2040 2070 2100 2130 

OTIKl^QANPNAGvLSDFSSWGLTADGQLKPDLSAPGGSIYAAINDNEYDmSGTSMASPHVAGATALVKQYLLKEHPEL 

ii = = ii mnmiiimimmmiihiiiimm i inimmii mini =i 

MTOTDIRPNPEAGLLSDFSSWGLSADGELKPDIAAPGGAIYAAINDNDYANMO^TSMASPHvAGAAvLW 

530 540 550 560 570 580 590 



